Assessing the Accuracy of the Models

Overview

When you run a pipeline, the modeling node assesses project data to determine the list of candidate models that are most appropriate for each time seriesan aggregation of transactional data into specified time intervals and sorted according to unique combinations of the default attributes (BY variables). For example, time series that have trends should be forecasta numerical prediction of a future value for a specified time period for each unique combination of BY variable values with models that have a trend component. Time series with seasonalitya regular change in time series data values that occurs at the same point in each time cycle. should be forecast with models that have a seasonal component. The modeling node generates a model selection list from the candidate models for each time series.

After the model selection list is determined for each time series, by default, the modeling node makes one-step-ahead forecasts on the in-sample data to select the best model and evaluate its performance. The model is selected based on the statistic of fita statistical value that is used to evaluate how well a forecasting model fits the historical series by comparing the actual data to the predicted values. specified for the modeling node. This statistic is also referred to as the model selection criterionthe statistic of fit that is used for forecast model selection.. The model selection criterion is computed using the predictions from the selected model to evaluate how well it fits or forecasts the series. When the full range of data is used to select and evaluate the model, the process is referred to as in-sample analysis.

Instead of using the default in-sample analysis for model selection and performance evaluation, consider specifying a holdout samplethe number of periods of the most recent data that should be excluded from the parameter estimation. The holdout sample can be used to evaluate the forecasting performance of a candidate model., an out-of-sample regionthe number of time periods before the end of the data that are removed when fitting models. After model selection, forecasts are generated in the out-of-sample region and then compared to the actual data to determine accuracy., or both. Specifying a holdout sample reserves the most recent data for model selection, resulting in models that are better fitted to the data. After the best model is selected, the holdout region is merged back to the initial fit region.

Similarly, specifying an out-of-sample region reserves the most recent portion of data to see how well the selected model generalizes to new data before it is deployed. Statistics of fit are calculated separately for this region. This allows evaluation of the forecast accuracy on new data that was not used to create the model.

Specifying a holdout sample and an out-of-sample region often provides a better way to perform model selection and evaluate model performance.

Example Data

To demonstrate the effect that designating holdout and out-of-sample regions can have on your forecasts, examine the data set in the following example. The observations in this data set start from January 1, 2011 and end on December 31, 2016. The time interval is set to Month, so all of the observations are accumulated into months. This leaves 72 time periods for each time series.

By default, the start and end dates for each project are determined by the earliest and latest observations in the source data for the project.

Using a Holdout Sample

The holdout sample is a subset of actual time periods that is excluded from the initial model fit. Initial forecasts are made within the holdout sample time range. For each candidate model, the statistic of fit is computed for the time range in the holdout sample. The candidate model that performs best in the holdout sample, based on this statistic, is selected to forecast the actual time series. The parameter estimates for the selected model are re-estimated over the full range of in-sample data prior to computing the final forecasts.

In the following figure, a holdout region of 12 is designated. The model selection list is generated based on the time series in the first five years of the data. Models are selected for each time series based on how close the predictions match the actual data in 2016, using the statistic of fit chosen by the forecaster.

In addition to specifying an integer for the holdout region, you can also specify a holdout percentage. The holdout percentage specifies the size of the holdout sample as a percentage of the length of the dependent series. The holdout percentage is used to determine the holdout region only if the resulting sample size is smaller than the integer value specified for the holdout sample.

By specifying the holdout region, the models selected can be chosen based on comparison with the more recent data. The predictions for each model are compared to actual data in the holdout region to determine the best fitting model.

In forecasting strategies based on neural networks, the holdout region is analogous to the validation region common in machine learning. This region is used to periodically evaluate the training process and prevent over-fitting the model.

Although using a holdout sample is preferred, it is not always feasible. In some cases, a time series might be too short to enable the effective use of a holdout sample.

You can set the holdout region for each modeling node in the Options panel. For example, for Auto-forecasting, under Model Selection is the setting Number of data points used in the holdout sample. If you enter an integer for the holdout sample, you can also add a holdout percentage.

Using an Out-of-Sample Region

The out-of-sample region is the number of time periods before the end of the data where the multistep forecasts are to begin. These time periods are removed from the diagnosis, the model selection step, and from the final model estimation step. Removing the most recent time periods provides a better assessment of model performance by comparing forecasts to new data that was not used to create the model. Since both a holdout sample and an out-of-sample region are designated for this example, the holdout region is used for model selection.

The following diagram shows an out-of-sample region of 12. The last 12 months in this data set are set aside for forecasts. Since an out-of-sample region is designated for this example, it uses the region preceding the out-of-sample periods for model selection.

Time Range with Out-of-Sample and Holdout Regions

It is important to remember that the forecast horizonthe number of intervals into the future, beyond a base date, for which analyses and predictions are made. specified for a project starts the forecasts at the beginning of the out-of-sample region. The number of time periods in the out-of-sample region are included in the total number of forecast observations. Therefore, the number of forecasts beyond the historical data is the total number of forecasts reduced by the number of observations in the out-of-sample region.

In this example, if the forecast horizon is set to 12 months and the out-of-sample region is set to 6, then only 6 months after the end of the historical data are forecast. If the out-of-sample region is equal to the forecast horizon, then no months after the end of the historical data are forecast. If the out-of-sample region is larger than the forecast horizon, then forecasts complete before the end of the historical data. For best results, consider setting the out-of-sample region for model comparison. When you determine the model with the best performance, you can remove the out-of-sample region and obtain your final forecasts using that model.

In forecasting strategies based on neural networks, the out-of-sample region is sometimes referred to as the test region. However, the concept is the same. The out-of-sample region is used to evaluate the forecast accuracy on new data that was not used to create the model.

You change the default out-of-sample region by specifying a positive integer in Project Settings using the Number of periods to exclude from modeling setting.

The following figure shows that, for the final forecasts, the holdout region is merged back with the initial fit region. In this example, the forecast horizon is set to 18, which allows 6 months of future forecasts after the out-of-sample forecasts are complete.

Selecting Statistics of Fit

The statistic of fit (or goodness of fit) measures how well the predictions match the actual time series data.

You should change the selection criterion after forecasts are generated only to investigate the robustness of the model selection. If the same model is the best performing when the selection criterion is MAPE or when the selection criterion is RMSE, then you can say that the model selection is robust with respect to a selection criterion of MAPE or RMSE. Choosing a selection criterion after the forecasts are generated can result in forecast bias.

For more information the statistics of fit, see Model Selection Criteria.

Last updated: March 16, 2026