Specify Targets and Define the Data Set
After configuring your source data in OneStream and creating your Sensible Machine Learning project, you can use the Targets page (Data > Targets) to configure your target source data to use in your model. This is a two-step process.
-
Define the database connection and data tables to use.
-
Specify target data source dimensions by selecting fields that contain the desired target dimensions, value dimension, date dimension, and location dimension (optional). You can specify multiple target data tables to be used during this step.
NOTE: A location can be selected as both a target dimension and a location dimension. Selecting a column as a location dimension ensures that, when configuring locations for your project, all the locations within that source column are pre-populated and pre-mapped to the respective target. Selecting a location as a target dimension adds further uniqueness and more granularity to the target intersection.
It is recommended to have a clustered column store index on data sources when able to avoid possible timeout issues.
IMPORTANT: You should have detailed knowledge of your target data sources and how the sourced data in the data columns match to dimensions used to store that data. See Appendix 1: Data Quality Guide for information on data planning for your project.
Define Your Target Data Source Connection
The first part to specifying targets and defining your data set for Sensible Machine Learning is to define the source connection for your data set.
-
Click Data > Targets.
-
The first time you access this page for a project, you must configure the data in your target data source.
-
Click Configure. The Add Target Data Set Connection dialog box displays.
-
In the Source Connection field, select the connection of your Target Data Set.
NOTE: This will be the same destination Connection for any Consumption Groups created in this project.
-
In the Table Name field, select the names of the initial set of target tables you created for importing into your Sensible Machine Learning project. If you imported multiple target data sources to use for the first model prediction, select the import table name for each. Select the check box next to each target table you are using for the first model prediction.
TIP: Only the first selected table name displays in the list after selecting. You can click the field to see all the selected import data files.
NOTE: The Data Source Name is required but is set to Target Data Set by default.
-
Click Preview.
A default Source Connection Name, the first imported Table Name and a default Data Source Name at the top of the Add Target Data Set Connection dialog box.
The Preview pane shows data from the first imported target data set. Each row of data in the Preview pane corresponds to a unique combination of data in the user-defined dimensions in the target source data set.
Use the information in the Preview pane to verify that the data in the correct target data source is being used. This includes data from the source shown in the Preview table and the target data source tables shown in the upper-right list in the Preview pane.
If the data in the Preview pane does not appear to be the correct source data, or the source tables are incorrect, you can click Update to change the selected source connection or target data source tables.
Once you are sure the correct source connection and data tables are being used and the source data shown in the Preview pane are verified, you can select the dimensions being used for the target data source connection.
Select Target Data Source Dimensions
Continue defining the data set by specifying the target dimensions, value dimension, date dimension and location dimension (optional) to use in your Sensible Machine Learning model. This consists of matching the dimensions to be used for the target data in your Sensible Learning Machine model to the dimensions reserved while creating a cube for the target data source.
NOTE: Only the specific dimensions reserved for Sensible Machine Learning should selected for each of the dimension types. If a dimension type does not correlate to data in the target data set, leave the field blank. If the location dimension is not being used, select None.
-
In the Target Dimensions field, select the target dimensions that have been defined to store data from your target data sources.
These columns in the source data set define all the target variables that are used for predictions. The distinct combination of values across the target dimensions defines a target.
Select the check box next to each applicable dimension. For example, if the user-defined dimensions UD1, UD2, and UD3 were reserved for source data and mapped to specific data columns in the data source, select UD1, UD2, and UD3 from the list.
NOTE: Selecting more target dimensions leads to a higher number of unique intersections (or targets) for which to forecast.
-
In the Value Dimension field, select the dimension used for the value data coming from the target data source. Typically, this dimension is used to store source data values such as sales numbers.
-
In the Date Dimension field, select the dimension reserved for date data coming from the target data source. Typically, this dimension is used to store the date data from the target data source.
-
In the Location Dimension field, optionally select the dimension reserved for location data coming from the target data source. Select None if your data source does not include location information. The Location Dimension is used for mapping event and feature information to relevant targets in the Configure section.
For example, the following data table contains weekly sales dollars by location, store, store type, department, and date. The potential target dimensions include Location, Store, Store_Type, and Dept, with Dept having the highest granularity. One or more of these can be selected, depending on the desired forecast level. The value dimension, in this case, would be weekly sales dollars, and the date dimension would be Date. Location can be selected as both a target dimension and the location dimension.
-
Click Run after making your target dimension selections. This adds a job to the job queue to validate the data and add the target data set to the model.
-
When the task completes, click Refresh Current Page .
The Data Source Preview pane displays.
NOTE: Once the Data Source Preview pane displays in the Targets page, the Configure button no longer displays on the page. that is the default view for the page.
The data in the Data Source Preview pane displays information on the dimensions used to run the preview, as well as the number of data intersections in the Sensible Machine Learning data sources. Location and frequency information also displays.
Review the information in the Data Source Preview pane to verify the data targets are correctly defined for the model.
The list on the right side of the Data Source Preview pane lists any data files imported. Click it to see the fill list of all target data files selected for this Sensible Machine Learning project. This is useful for verifying that the correct files were imported.
If any data in the Data Source Preview pane is not as expected, you can click Update to open the Update Target Database Connection dialog box and reselect target data source dimensions, or click Update in the dialog box to change target data set connection information.
NOTE: The Update button is visible after the initial source connection is saved but is no longer visible after running the data set job in the Data > Dataset page.
Once you verify the data in the Data Source Preview pane, you can continue by specifying data features. If features are not included in your data sources, continue by verifying your data sets in Sensible Machine Learning.