Specify Data Features

Data Sources containing features can be added on the Configure > Features > Source page. You can use features during modeling to help enhance prediction accuracy.

The Features page lets you specify multiple feature data sources. You can see previews of the features contained in each feature data source. Users can also commit or uncommit the features from the project build, as well as modify settings for each individual feature.

NOTE: If you are not using a features data source in your project, you can skip this page.

Define Your Feature Data Source Connection

This page shows what your data definition looks like before configuring the definitions. Panel on the right changes after you configure it.

Click Configure > Features > Source to open the Source Features page. This first time you access this page, the Feature Data Source pane shows no feature data set information.

OneStream SensibleAI Forecast Configure Features screen showing an empty table for feature data sources with a prompt to add the first feature data source.

You can use the Source Features page to configure your feature source data to use in your model. Like specifying targets and defining your target data set, this is a two-step process.

Define the data source connection and data sources to use.
Specify feature data source dimensions by selecting fields that contain the desired feature dimensions, value dimension, date dimension, and location dimension (optional). You can specify multiple feature data sources to be used during this step.

IMPORTANT: You should have detailed knowledge of your feature data sources and how the sourced data in the data columns match to dimensions used to store that data. See the SensibleAI Forecast Data Quality Guide for information on data planning for your project.

Specify the Data Source Connection

The first part to specifying features and defining your feature data set for SensibleAI Forecast is to define the source connection for your data set.

In the Feature Data Sources pane, click Add to add a feature data source to your SensibleAI Forecast model. The Add Feature Data Set Connection dialog box displays.
In the Source Data Connection field, select the connection type of your Feature Data Set.
Select your data source resource.
Select the Name of the resource to connect to.
Select the names of the initial set of feature data sources you created for importing into your SensibleAI Forecast project. If you imported multiple feature data sources for the first model prediction, select the import data source for each.

TIP: Only the first selected table name displays in the list after selecting. You can click the field to see all the selected import data files.
In the Data Source Name field, type a name for the feature data source you are creating.
Select next to open the preview pane.

IMPORTANT: Use the information in the Preview pane to verify that the data in the correct feature data source is being used.

If you cannot verify the data in the Preview pane is the correct source data, or the sources are incorrect, you can navigate to previous steps to change the data source selected.

Once you are sure the correct source connection and data tables are being used and the source data shown in the Preview pane are verified, you can select the dimensions being used for the target data source.

Select Feature Data Source Dimensions

Continue specifying features and defining the data set by specifying any feature dimensions, value dimension, date dimension or location dimension for the data set to use in your SensibleAI Forecast model. This is basically matching the dimensions to be used for the feature data in your Sensible Learning Machine model to the dimensions reserved while creating a cube for the feature data source.

NOTE: Only the specific dimensions reserved for SensibleAI Forecast should be selected for each of the dimension types. If a dimension type does not correlate to data in the feature data set, leave the field blank. If the location dimension is not being used, select None.

In the Intersection Dimensions field, select the feature dimensions that have been defined to store data from your feature data sources.

The columns in the source feature data set define all the feature variables that are used for predictions. The distinct combination of values across the feature dimensions define a feature.

Select the check box next to each applicable dimension. For example, if the user-defined dimensions UD1, UD2, UD3 and UD4 were reserved for source data and mapped to specific data columns in the data source, select UD1, UD2, UD3, and UD4 from the list.

If dimensions are selected for the feature data source that have the same name as a dimension in the target data source, then those dimensions are used to map features to targets. For example, UD1 is in the feature dimensions and the target dimensions, features with a value in the UD1 dimension are only mapped to targets with that same value in the UD1 dimension.

NOTE: Setting the feature dimensions to the exact same dimensions specified for the target data set causes an error when running the job to validate the data and add the feature data set to the model.

TIP: Selecting more feature dimensions leads to a higher number of unique intersections (or features) you can use in forecasting.
In the Value Dimension field, select the dimension used for the value data coming from the feature data source. Typically, this dimension is used to store source data values such as sales numbers.

NOTE: Only numeric values can be used to aid in predictions. Other types of values such as text are ignored.
In the Date Dimension field, select the dimension reserved for date data coming from the feature data source.
In the Location Dimension field, select the dimension reserved for location data coming from the feature data source (optional).

Select None if your feature data source does not include location information. The Location dimension is used during modeling to automatically map features to targets that have a location that is geographically inside of or equivalent to a given feature’s location. For example, a feature with the location Michigan is mapped to a target with the location Rochester, Michigan, but is not mapped to a target with the location USA.

TIP: The location dimension can also be a feature dimension that adds uniqueness.

NOTE: A location can be selected as both a feature dimension and a location dimension. Selecting a column as a location dimension ensures that, when configuring locations for your project, all the locations within that source column are pre-populated. Selecting a location as a feature dimension adds further uniqueness and more granularity to the feature intersection.
Complete the workflow after making your feature dimension selections, then click Save. This adds a job to the job queue to validate the data and add the feature data set to the model. The job runs tasks to complete the data definitions. A progress bar shows task progress. You can click Cancel Task at any time while the task is running to stop running the data definitions.
When the task completes, click Refresh Current Page .

The Features page displays the added feature data source listed in the All Feature Data Source pane. The Feature Data Source Preview pane displays below the All Feature Data Source pane, showing information for the top 100 feature records.

NOTE: Once the Data Source Preview pane displays in the Features page, the Configure button no longer shows on the page, as it is the default view for the page.

OneStream SensibleAI Forecast Configure Features screen showing a list of feature data sources with one selected, including a preview table of feature data and configuration panels for date, series, and dimension settings.

You can also edit, delete, commit, or add a new feature set.

Edit Feature Data Source Attributes

Once a feature data source has been added to the project, the All Feature Data Sources pane displays it at the top of the Features page. SensibleAI Forecast lets you set specific attributes for each feature in the data set.

Editing feature data source attributes is optional. Each feature's attributes have a default setting. Review the selections for each attribute. If you are satisfied with the defaults, click Cancel in the Feature Attributes dialog box without making changes, then commit the feature data source.

Click to select the data source whose attributes you want to edit.

TIP: Data information for the selected feature data set displays in the Data Source Preview pane.
Click the Edit the Selected Feature Data Source's Attributes button at the bottom of the pane. The Feature Attributes dialog box defaults to Custom view, which lists all the selected data source's feature attributes, and shows whether each attribute is selected (Yes) or not selected (No).

Each feature data set listed includes the following attributes:

Allow Feature Selection: The default value Yes allows the attribute to be filtered out during the feature selection process. Select No to ensure the feature is not filtered out during the feature selection process.

If too many features for a given target are set to No, then they still go through the feature selection process. This is to prevent too many features from being fed into any one model. This limit depends on which models are being run.

Allow Feature Engineering: The default value Yes indicates the feature can be engineered. Selecting No ensures that a feature cannot be engineered, such as lagging temperature by two weeks.

Scenario Modeling Feature: Select Yes if the feature should be included when defining custom Scenarios in Utilization and the intention of the project is to run predictions on different Scenarios. Otherwise, select No.

NOTE: When selecting Yes for any event:

- The project will be considered a Scenario Modeling project by the Xperiflow Engine.

- Altering a Scenario Modeling project after the job has run requires a Restart or Manual Rebuild.

- Known In Advance automatically changes to Yes.

Known In Advance (KIA): The default value No indicates that this feature does not have data that extends past the last actual data point (such as weather forecast for the next two weeks). Known-in-advance features cannot have any missing data past the forecast range (for example, five weeks for a five week forecast). Select Yes for the attributes that you know have data that extends beyond the forecast range.

IMPORTANT: The prediction job cannot run if this setting is set to Yes and the feature is not available through the forecast when trying to run predictions.

KIA Date Range (Days): For features with Known In Advance set to Yes, this attribute allows the user to specify how many days of data will be known in advance. This attribute defaults to being blank. If a value is given to this attribute, but Known In Advance is not set to Yes, Xperiflow will automatically update the feature to Known In Advance set to Yes.

IMPORTANT: The prediction job cannot run if this setting is configured and the number of days specified part of the feature data source.

Aggregation Method: This attribute allows a user to specify a preferred method of aggregating the feature data. By default, this will be set to None and the following options can be selected: Sum, Mean, Median, Last, Max, Min, and Mode.

Data Cleansing Method: This attribute allows a user to specify a preferred method of cleaning missing feature data. By default, this will be set to None and the following options can be selected: Mean, Zero, Interpolate, Kalman, and Local Median.

Frequency Override: This attribute allows a user to override the frequency of the feature data. By default, this will be set to None. If None is selected, Xperiflow will automatically determine the frequency of the feature data.

ML Type: This attribute allows a user to specify the data type of the feature data. By default, this will be set to None and the following options can be selected: Binary Categorical, DateTime, Multi Categorical, Numerical, and Text.
In the Feature Attributes dialog box, edit the feature's attributes in one of the following ways:

Custom: Allows you to modify individual attribute values for features as desired.
- Select a feature, then select the attributes values for that feature by clicking in each of the attribute selection fields and selecting Yes or No depending on the desired value.
- Click the Save button in the button bar to save your feature attribute changes.
Modify All: Allows you to apply an individual attribute value to all features in a given feature data set.
- Select the attribute option to apply the value.
- Select the value of the attribute to apply.
- Click the Save button at the bottom of the Feature Attributes dialog box, to save your feature attribute change and apply the selected value to the selected attribute for all features.

The data in the Data Source Preview pane displays information on the dimensions used to run the preview, as well as the number of data intersections in the SensibleAI Forecast data sources to be used for the model.

Verify Data Source Information

Review the information in the Data Source Preview pane to verify the data features are correctly defined for the model.

If any data in the Data Source Preview pane is not as expected, you can select the feature data source in the Selected Feature Data Source pane and do the following:

Click the Update the Selected Feature Data Source button. This opens the Update Feature Database Connection dialog box so you can reselect feature data source dimensions.
Select the feature data source in the Selected Feature Data Source pane and click the Delete button, then click Delete again to remove the selected feature data source from the list.

Commit or Decommit a Feature Data Source

You must commit any feature data sources to use in the SensibleAI Forecast project. You can also decommit any committed feature data source.

In the Selected Feature Data Source pane, select the feature data source and click the Commit button.
A message box informs you that the selected data source's commit status has changed. Click OK to close the message box.
Commit any other feature data sources as needed by repeating the previous steps.

Once you have committed your data sets, continue by processing Feature Data Sources to be used with your SensibleAI Forecast project.

NOTE: Feature data sources can only be committed for a full build and not for a partial build.

Process Feature Data Sources

After a feature data source has been committed, the user must process the feature data source. To process the data source:

Upon the initial steps defined above when configuring Feature Data Sources, the Process Data Sources button will be disabled.
After committing one or multiple Feature Data Sources, the Process Data Sources button will become enabled.
When enabled the All Feature Data Sources grid will also display the Requires Processing field as on for any committed data sources. Upon these conditions, the user should click the Process Data Sources button, which will start a Feature Data Load job in the Xperiflow engine.
Upon completion of the Feature Data Load job, the Requires Processing field will be updated to off for all committed data sources and the Process Data Sources button will be disabled.

The Feature Data Load job is required to be run for any new changes to the Feature Data Sources. The above example is for configuring, committing, and loading a new Feature Data Source, but a Feature Data Load job will also be required for the following conditions:

A Feature Data Source that has been committed and been included in a Feature Data Load job is uncommitted.
A Feature Data Source that has been committed and been included in a Feature Data Load job has updates made to its Data Source Attributes.

NOTE: A user will not be able to navigate to the Pipeline Section of Model Build if any Feature Data Sources require processing. The Feature Data Load job can be run as many times as required to process all of the Feature Data Sources.