Collection Lag

Collection lag is the time between the most recent date in the data set (across all targets) and the latest received data for a single target.

The following image is a daily level data set containing a date column and two target columns named Apples and Bananas. As seen in the data set, the most recent date in the data set is 10/17/2021 (provided by Bananas). In this example, Apples has a collection lag of two days, which implies Apple’s data is received two days after the most recent date in the data set.

A two-day collection lag is considered normal.

Collection lag - daily level dataset

Target Collection Lag: The time between the most recent date of the data set and when you receive the corresponding data for a specific target variable for that moment in time. In the following image, the target collections are: Apples – 2 days, Bananas – 0 days, Carrots – 4 days.

Target collection lag in a daily level dataset

Leverage the Configured Collection Lag and Configured Forecast Range

Note: This entire section is removed for 2.0, but can return in a later release when the setting is available.

Sensible Machine Learning uses the configured collection lag to determine the true forecast range. This is done by taking the configured collection lag and adding the configured forecast range. The configured forecast range is the number of days you want to predict results.

For example, if a configured collection lag is set to two days and the configured forecast range is set to seven days, Sensible Machine Learning forecasts nine days forward to account for the missing four days.

Collection lag and forecast for a Sensible Machine Learning forecast

Before a forecast can be produced, Sensible Machine Learning pre-processes all targets to have the same temporary target collection lag. Sensible Machine Learning autonomously handles these cases on a target-by-target basis.

Target Collection Lag is Greater than Configured Collection Lag

For Carrots in the previous image, the target collection lag is greater than the configured collection lag. The two missing days are filled in by interpolation techniques. Sensible Machine Learning performs a best guess at those values.

Target Collection Lag is Less than Configured Collection Lag

For Bananas in the previous image, the target collection lag is less than the configured collection lag. The two extra days ahead of the configured collection lag are removed. Sensible Machine Learning does not use these values to stay consistent.

Target Collection Lag and Configured Collection Lag are Equal

This means that the target collection lag matches the configured collection lag.

Target Collection Lag and Configured Collection lag are equal

Note: This entire following section is removed for 2.0, but can return in a later release when the setting is available.

Considerations for Setting the Configured Collection Lag

Setting the Configured Collection Lag to Large

A configured collection lag that is larger than most of the data sets' target collection lag negatively impacts model accuracy. First, Sensible Machine Learning needs to remove the most recent values for any target where the collection lag is smaller than the configured collection lag. This removes the most relevant information models can learn from. Second, the larger the true forecast range (collection lag plus the forecast range), the more likely model accuracy decreases. This is because the features associated with the targets and models created in feature engineering must be lagged further into the future, in general, the larger the lag, the less impactful the features.

Setting the Configured Collection Lag to Small

Setting a configured collection lag smaller than the majority of the target collection lag in the data set results in many targets that have their most recent dates constantly filled with estimated values. Continually filling most recent values with estimated data compromises the target's associated model's understanding of the data patterns and decreases accuracy.