Analysis Sources - MindBridge Documentation

An analysis source represents a table of data (often from a file) that is required to run an analysis. These objects contain ingestion metadata, including data formats, density and frequency analysis, and much more. The analysis sources are the core object types used during the data import process and provide the data necessary to complete the analysis.

Analysis source type

An analysis source type determines which features are available during the analysis source import process, and must be selected when creating an analysis source. Refer to the Analysis Type endpoint to determine which analysis source types can be applied to a given analysis. Refer to the Analysis Source Type endpoint to determine the features and column mappings for a given analysis source type.

Additional Analysis Data

Additional data is available as an analysis source type for all analysis types. When creating an additional data analysis source, the additionalDataColumnField property must be set to the additional data field added during the import of other source types. A list of the analysis’ additional data columns is available from its importantColumns field.

Async create and update responses

Unlike other entities, the analysis source entity may perform long-running background jobs as a result of a Create or Update call. As a result, calls to Create or Update an analysis source will return an async result entity. Users should poll this entity and await its completion before re-loading the analysis source and making further changes.

Analysis Source workflow

Unlike other entities, importing an analysis source relies on a multi-step workflow process. Which steps are included is determined by which features the analysis source type supports. These features include multiple workflow states, which determine the current location of the analysis source within the workflow. There are two workflow state types: step states and transition states.

Step states allow users to configure properties on an analysis source, or the analysis more broadly. Some step states are provided for the MindBridge web app interface and may have little to no meaningful interaction with the API.
Transition states indicate that the analysis source is performing work asynchronously, and will eventually transition to another state.

Here is a list of features and their possible workflow states:

Feature	State name	State type
Feature independent	STARTED	Transition
Format detection	DETECTING_FORMAT	Transition
	FORMAT_DETECTED	Step
	FORMAT_DETECTION_COMPLETED	Transition
Data validation	ANALYZING_COLUMNS	Transition
	COLUMNS_ANALYZED	Step
Column mapping	DATA_VALIDATION_CONFIRMED	Step
	COLUMN_MAPPINGS_CONFIRMED	Transition
Effective date metrics	ANALYZING_EFFECTIVE_DATE_METRICS	Transition
	EFFECTIVE_DATE_METRICS_ANALYZED	Step
	ANALYSIS_PERIOD_SELECTED	Transition
Transaction ID selection	CHECKING_INTEGRITY	Transition
	INTEGRITY_CHECKED	Step
Parse	PARSING	Transition
	PARSED	Step
Review Funds	FUNDS_REVIEWED	Step
Confirm Settings	SETTINGS_CONFIRMED	Transition
Feature independent	COMPLETED	Step
	FAILED	Step

Transitioning between states

To transition between workflow states, set the targetWorkflowState property to the name of the desired workflow state. Once set, the workflow will attempt to advance to that state, passing through all states between the current and target state without stopping. If the target state is a transition state, the workflow will continue past it until it reaches the next step state. When creating a new analysis source, if targetWorkflowState is not set, the workflow will advance to the first valid step state, then stop. These rules apply to all transitions, with a few exceptions:

If an ungrouped format is detected within the selected source file, the FORMAT_DETECTED state will be ignored. Setting targetWorkflowState to FORMAT_DETECTED on a source file containing ungrouped data will result in the workflow continuing to the next step state.
If the final workflow state type is a step state, the workflow will advance to COMPLETED instead of stopping. This is often the case with the PARSED state, as it is the final feature state for many source types.
If an error occurs during a workflow process, the workflow will transition to FAILED. The cause of the failure can often be found in the async result or in the analysis source’s errors property.

Feature properties

Certain feature properties may only be read or set on or after specific workflow states. Here is a breakdown of the relationship between workflow states and analysis source properties:

Format detection

If a grouped data format is detected, the workflow will stop at the FORMAT_DETECTED state. Then, the detectedFormat property will return the name of the detected format. From there, the applyDegrouper can be set. If applyDegrouper is true then when advancing to the next step, the relevant formatter will be used to convert the file into a readable format. If not, the file will be used as is. applyDegrouper can be set immediately upon creation of the analysis source. If this is done and grouped data is detected within the workflow, instead of stopping on FORMAT_DETECTED it will implicitly apply the formatter and continue to the next workflow state. If the formatter is used in either scenario, the degrouperApplied property will be set to true. The presetHeaderRowIndex field can be set to manually specify the header row’s position in the imported file, bypassing automatic header detection. The field is zero-based, so the first row in the file is considered row 0.

Data validation

Upon reaching the COLUMNS_ANALYZED state, the fileInfo property is populated with information, including metadata for the individual columns (which is available as part of the columnData property) and for the file as a whole (available under the metadata property). Metrics contain a state property, which may appear as PASS, WARN, or FAIL. These serve as status indicators and may warn of problems with the file’s data.

Column mapping

Upon reaching the DATA_VALIDATION_CONFIRMED state, the proposedVirtualColumns, proposedColumnMappings, and proposedAmbiguousColumnResolutions properties are applied, and the virtualColumns, columnMappings, and ambiguousColumnResolutions properties are updated accordingly. More information on the proposed fields is available in the Full source import automation section. If no value is present for proposedColumnMappings, then a set of recommended column mappings will be applied to the file. The API handles column mapping by assigning a column from the file to a MindBridge field with a compatible data type. To do this, an entry in the columnMappings array must contain both the position of the column from the source file and the target mindbridgeField to assign it to.

Virtual columns

Virtual columns can be added, modified, and removed by changing the virtualColumns property. Once created, virtual columns have a position property that can be used in column mapping; the metrics in fileInfo will be updated accordingly.

Ambiguous column resolution

While MindBridge can detect usable date formats, including different formats that appear within a single column, in some cases the format of date and currency fields cannot be determined from the dataset provided. For example, the date format 1/2/2022 could either be January 2nd, 2022, or February 1st, 2022, depending on which date format is being used. When ambiguous columns are detected, an entry in the ambiguousColumnResolutions property will be created with a list of all the possible formats in its ambiguousFormats property. To resolve this issue, the correct format from the list of ambiguousFormats should be set in the selectedFormat property.

Additional data

Unmapped columns may be added as additional data columns. To do so, a special mapping must be created with the mindbridgeField left blank, and a value set for additionalColumnName. Once this source has completed the import process, a new source with the same name set for additionalDataColumnField and the source type ID corresponding to the Additional Data Source Type can be created.

Effective date metrics

This step confirms that the file’s entries fall within the current analysis period. Once this state has been reached, the analysis-sources/{analysisSourceId}/effective-date-metrics endpoint can be used to get a set of metrics regarding the number of entries within the source’s analysis period. A period value can be used to set the resolution of the histogram to days, weeks, or months.

Transaction ID selection

Transitioning into the transaction ID selection feature will generate a preview of the selected transaction by proposedTransactionIdSelection and set the transactionIdSelection property to this value. If no value is set, a set of potential transaction ID selections will be generated and what is determined to be the best selection, according to MindBridge’s internal tooling, will be set to the transactionIdSelection property. Details about these transaction ID previews can be viewed via the transaction ID preview endpoints. When on the INTEGRITY_CHECKED state, changing the transactionIdSelection will either select an existing preview with the same properties as the currently selected transaction ID, or, if it doesn’t exist, will generate a new preview for the selection, and select it.

Review funds and confirm settings

These features have no direct interaction with the API and are only relevant with respect to the web application.

Full source import automation

When creating a new analysis source it is possible to perform the entire import process by setting the targetWorkflowState property to COMPLETED. As a result of performing all workflow steps at once, some features require input to be provided immediately, specifically the column mapping and transaction ID selection features. To provide these values, separate proposed fields can be set in order to provide these values instead of relying on the default column mapping and transaction ID selection behavior.

Column mapping

The proposedVirtualColumns, proposedColumnMappings, and proposedAmbiguousColumnResolutions properties provide the ability to pre-define their respective properties ahead of running validation. These proposed versions of the properties have some key differences from their applied counterparts:

As validation has not been run and the column positions have not been determined, the position property for virtual columns isn’t available. As an alternative to the proposedColumnMappings property, a virtualColumnIndex property is available and should be used to reference a specific virtual column definition in the proposedVirtualColumns array.
proposedAmbiguousColumnResolutions are not able to determine the ambiguous formats ahead of validating the file, so the ambiguousFormats property is not available. Additionally, the position property can only refer to a physical column, and not a virtual one.

Transaction ID selection

The proposedTransactionIdSelection property can be defined immediately upon creating the analysis source. If it is set while entering the transaction ID selection feature then a transaction ID preview will only be generated for that selection, and transactionIdSelection property will be updated accordingly.

​Analysis source type

​Additional Analysis Data

​Async create and update responses

​Analysis Source workflow

​Transitioning between states

​Feature properties

​Format detection

​Data validation

​Column mapping

​Virtual columns

​Ambiguous column resolution

​Additional data

​Effective date metrics

​Transaction ID selection

​Review funds and confirm settings

​Full source import automation

​Column mapping

​Transaction ID selection

Analysis source type

Additional Analysis Data

Async create and update responses

Analysis Source workflow

Transitioning between states

Feature properties

Format detection

Data validation

Column mapping

Virtual columns

Ambiguous column resolution

Additional data

Effective date metrics

Transaction ID selection

Review funds and confirm settings

Full source import automation

Column mapping

Transaction ID selection