> ## Documentation Index
> Fetch the complete documentation index at: https://developer.mindbridge.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Analysis Sources

An **analysis source** represents a table of data (often from a file) that is required to run an analysis. These objects
contain ingestion metadata, including data formats, density and frequency analysis, and much more. The analysis sources
are the core object types used during the data import process and provide the data necessary to complete the analysis.

### Analysis source type

An **analysis source type** determines which features are available during the analysis source import process, and must
be selected when creating an analysis source.

Refer to the `Analysis Type` endpoint to determine which analysis source types can be applied to a given analysis.

Refer to the `Analysis Source Type` endpoint to determine the features and column mappings for a given analysis source
type.

#### Additional Analysis Data

**Additional data** is available as an analysis source type for all analysis types. When creating an additional data
analysis source, the `additionalDataColumnField` property must be set to the additional data field added during the
import of other source types. A list of the analysis' additional data columns is available from its `importantColumns`
field.

### Async create and update responses

Unlike other entities, the analysis source entity may perform long-running background jobs as a result of a `Create`
or `Update` call. As a result, calls to `Create` or `Update` an analysis source will return an **async result** entity.
Users should poll this entity and await its completion before re-loading the analysis source and making further changes.

### Analysis Source workflow

Unlike other entities, importing an analysis source relies on a multi-step workflow process. Which steps are included is
determined by which **features** the analysis source type supports. These features include multiple **workflow states**,
which determine the current location of the analysis source within the workflow.

There are two workflow state types: **step states** and **transition states**.

* Step states allow users to configure properties on an analysis source, or the analysis more broadly. Some step states
  are provided for the MindBridge web app interface and may have little to no meaningful interaction with the API.

* Transition states indicate that the analysis source is performing work asynchronously, and will eventually transition
  to another state.

Here is a list of features and their possible workflow states:

| Feature                  | State name                          | State type |
| ------------------------ | ----------------------------------- | ---------- |
| Feature independent      | STARTED                             | Transition |
| Format detection         | DETECTING\_FORMAT                   | Transition |
|                          | FORMAT\_DETECTED                    | Step       |
|                          | FORMAT\_DETECTION\_COMPLETED        | Transition |
| Data validation          | ANALYZING\_COLUMNS                  | Transition |
|                          | COLUMNS\_ANALYZED                   | Step       |
| Column mapping           | DATA\_VALIDATION\_CONFIRMED         | Step       |
|                          | COLUMN\_MAPPINGS\_CONFIRMED         | Transition |
| Effective date metrics   | ANALYZING\_EFFECTIVE\_DATE\_METRICS | Transition |
|                          | EFFECTIVE\_DATE\_METRICS\_ANALYZED  | Step       |
|                          | ANALYSIS\_PERIOD\_SELECTED          | Transition |
| Transaction ID selection | CHECKING\_INTEGRITY                 | Transition |
|                          | INTEGRITY\_CHECKED                  | Step       |
| Parse                    | PARSING                             | Transition |
|                          | PARSED                              | Step       |
| Review Funds             | FUNDS\_REVIEWED                     | Step       |
| Confirm Settings         | SETTINGS\_CONFIRMED                 | Transition |
| Feature independent      | COMPLETED                           | Step       |
|                          | FAILED                              | Step       |

### Transitioning between states

To transition between workflow states, set the `targetWorkflowState` property to the name of the desired workflow state.
Once set, the workflow will attempt to advance to that state, passing through all states between the current and target
state without stopping.

If the target state is a transition state, the workflow will continue past it until it reaches the next step state.

When creating a new analysis source, if `targetWorkflowState` is not set, the workflow will advance to the first valid
step state, then stop.

These rules apply to all transitions, with a few exceptions:

* If an [ungrouped format](https://support.mindbridge.ai/hc/en-us/articles/10437018464407) is detected within the
  selected source file, the FORMAT\_DETECTED state will be ignored. Setting `targetWorkflowState` to FORMAT\_DETECTED on a
  source file containing ungrouped data will result in the workflow continuing to the next step state.
* If the final workflow state type is a step state, the workflow will advance to COMPLETED instead of stopping. This is
  often the case with the PARSED state, as it is the final feature state for many source types.
* If an error occurs during a workflow process, the workflow will transition to FAILED. The cause of the failure can
  often be found in the async result or in the analysis source’s `errors` property.

### Feature properties

Certain feature properties may only be read or set on or after specific workflow states. Here is a breakdown of the
relationship between workflow states and analysis source properties:

#### Format detection

If a [grouped data](https://support.mindbridge.ai/hc/en-us/articles/10437018464407) format is detected, the workflow
will stop at the FORMAT\_DETECTED state. Then, the `detectedFormat` property will return the name of the detected format.
From there, the `applyDegrouper` can be set. If `applyDegrouper` is true then when advancing to the next step, the
relevant formatter will be used to convert the file into a readable format. If not, the file will be used as is.

`applyDegrouper` can be set immediately upon creation of the analysis source. If this is done and grouped data is
detected within the workflow, instead of stopping on FORMAT\_DETECTED it will implicitly apply the formatter and continue
to the next workflow state.

If the formatter is used in either scenario, the `degrouperApplied` property will be set to `true`.

The `presetHeaderRowIndex` field can be set to manually specify the header row’s position in the imported file, bypassing automatic header
detection. The field is zero-based, so the first row in the file is considered row 0.

#### Data validation

Upon reaching the COLUMNS\_ANALYZED state, the `fileInfo` property is populated with information, including metadata for
the individual columns (which is available as part of the `columnData` property) and for the file as a whole (available
under the `metadata` property).

Metrics contain a `state` property, which may appear as `PASS`, `WARN`, or `FAIL`. These serve as status indicators and
may warn of problems with the file’s data.

#### Column mapping

Upon reaching the DATA\_VALIDATION\_CONFIRMED state, the `proposedVirtualColumns`, `proposedColumnMappings`,
and `proposedAmbiguousColumnResolutions` properties are applied, and the `virtualColumns`, `columnMappings`,
and `ambiguousColumnResolutions` properties are updated accordingly. More information on the proposed fields is
available in the **Full source import automation** section.

If no value is present for `proposedColumnMappings`, then a set of recommended column mappings will be applied to the
file.

The API handles column mapping by assigning a column from the file to a MindBridge field with a compatible data type. To
do this, an entry in the `columnMappings` array must contain both the `position` of the column from the source file and
the target `mindbridgeField` to assign it to.

#### Virtual columns

[Virtual columns](https://support.mindbridge.ai/hc/en-us/articles/10442701235223) can be added, modified, and removed by
changing the `virtualColumns` property. Once created, virtual columns have a `position` property that can be used in
column mapping; the metrics in `fileInfo` will be updated accordingly.

#### Ambiguous column resolution

While MindBridge can detect usable date formats, including different formats that appear within a single column, in some
cases the format of date and currency fields cannot be determined from the dataset provided. For example, the date
format `1/2/2022` could either be January 2nd, 2022, or February 1st, 2022, depending on which date format is being
used. When ambiguous columns are detected, an entry in the `ambiguousColumnResolutions` property will be created with a
list of all the possible formats in its `ambiguousFormats` property. To resolve this issue, the correct format from the
list of `ambiguousFormats` should be set in the `selectedFormat` property.

#### Additional data

Unmapped columns may be added as additional data columns. To do so, a special mapping must be created with
the `mindbridgeField` left blank, and a value set for `additionalColumnName`. Once this source has completed the import
process, a new source with the same name set for `additionalDataColumnField` and the source type ID corresponding to the
Additional Data Source Type can be created.

#### Effective date metrics

This step confirms that the file’s entries fall within the current analysis period. Once this state has been reached,
the `analysis-sources/{analysisSourceId}/effective-date-metrics` endpoint can be used to get a set of metrics regarding
the number of entries within the source’s analysis period. A `period` value can be used to set the resolution of the
histogram to days, weeks, or months.

#### Transaction ID selection

Transitioning into the [transaction ID](https://support.mindbridge.ai/hc/en-us/articles/8740577590295) selection feature
will generate a preview of the selected transaction by `proposedTransactionIdSelection` and set
the `transactionIdSelection` property to this value. If no value is set, a set of potential transaction ID selections
will be generated and what is determined to be the best selection, according to MindBridge's internal tooling, will be
set to the `transactionIdSelection` property.

Details about these **transaction ID previews** can be viewed via the transaction ID preview endpoints.

When on the INTEGRITY\_CHECKED state, changing the `transactionIdSelection` will either select an existing preview with
the same properties as the currently selected transaction ID, or, if it doesn’t exist, will generate a new preview for
the selection, and select it.

#### Review funds and confirm settings

These features have no direct interaction with the API and are only relevant with respect to the web application.

### Full source import automation

When creating a new analysis source it is possible to perform the entire import process by setting
the `targetWorkflowState` property to `COMPLETED`. As a result of performing all workflow steps at once, some features
require input to be provided immediately, specifically the column mapping and transaction ID selection features. To
provide these values, separate `proposed` fields can be set in order to provide these values instead of relying on the
default column mapping and transaction ID selection behavior.

#### Column mapping

The `proposedVirtualColumns`, `proposedColumnMappings`, and `proposedAmbiguousColumnResolutions` properties provide the
ability to pre-define their respective properties ahead of running validation. These proposed versions of the properties
have some key differences from their applied counterparts:

* As validation has not been run and the column positions have not been determined, the `position` property for virtual
  columns isn’t available. As an alternative to the `proposedColumnMappings` property, a `virtualColumnIndex` property
  is available and should be used to reference a specific virtual column definition in the `proposedVirtualColumns`
  array.
* `proposedAmbiguousColumnResolutions` are not able to determine the ambiguous formats ahead of validating the file, so
  the   `ambiguousFormats` property is not available. Additionally, the `position` property can only refer to a physical
  column, and not a virtual one.

#### Transaction ID selection

The `proposedTransactionIdSelection` property can be defined immediately upon creating the analysis source. If it is set
while entering the transaction ID selection feature then a transaction ID preview will only be generated for that
selection, and `transactionIdSelection` property will be updated accordingly.
