Run the Pipeline

When running a pipeline, all specified data configurations are brought together to generate and transform the data. It then selects the most predictive features and iteratively trains and tests models against historical data. This is the longest run of the solution because it is where the most data science work completes.

When you first navigate to the Run page in the Pipeline section, a Run Pipeline button displays in the center of the page with a variety of statistics and settings showing some of the project's currently configured settings. This indicates your data is ready for modeling.

Click Run Pipeline to run the pipeline job.

NOTE: The pipeline job must run to successful completion before you can access other pages in the Pipeline section.

During the pipeline run, the XperiFlow engine does the following:

Feature Generation, Transformation, and Selection: The engine takes in all the data and information that is added to the project in the Data and Configure sections and begins the process of running feature engineering for each target. While creating numerous new features, the engine also selects the best features to keep for each target. These are used later for all the models that run for a target to increase the predictive accuracy.

Hyperparameter Tuning, Model Training, and Model Selection: With all the configurations and the newly found important features, the engine runs multiple models for each configuration to find the best ones. This process involves hyperparameter tuning each model on multiple splits of the data and then saving the accuracy metrics of each model.

When the pipeline run completes, click Refresh to view a summary page that displays pipeline job results. Run statistics for the most recently completed pipeline job display, as shown in the following graphic:

After the pipeline run job completes, the top half Run page shows:

Features Generated: Total number of features generated by the pipeline job.

Total Experiments: Total number of groups plus single targets being run in the model build. The AI Services engine is running an experiment for each of these targets or groups to find the best model possible.

Progress: The current completion percentage of the pipeline job.

Models Iterated: Number of times models were iterated with different hyperparameter settings during the pipeline job.

Models Trained: Number of unique models that were trained.

Status: The completion status of the most recently started pipeline job.

Start Time, End Time: Start and end time of the most recently completed pipeline job.

Last Refresh, Queued Time: Date and time the pipeline run page was last refreshed.

The bottom half of the Run page shows:

Pipeline Job Recent Tasks: Table that displays details of the most recent tasks run in the pipeline job. This table shows running tasks while the pipeline job is running and completed tasks after the pipeline job successfully completes.