X11 Procedure

Details of Model Selection

If an ARIMA statement is present but no MODEL= is given, PROC X11 estimates and forecasts five predefined models and selects the best. This section describes the details of the selection criteria and the selection process.

The five predefined models used by PROC X11 are the same as those used by X11ARIMA/88 from Statistics Canada. These particular models, shown in Table 2, were chosen on the basis of testing a large number of economics series (Dagum 1988) and should provide reasonable forecasts for most economic series.

Table 2: Five Predefined Models

Model # Specification Multiplicative Additive
1 (0,1,1)(0,1,1)s Log transform No transform
2 (0,1,2)(0,1,1)s Log transform No transform
3 (2,1,0)(0,1,1)s Log transform No transform
4 (0,2,2)(0,1,1)s Log transform No transform
5 (2,1,2)(0,1,1)s No transform No transform


The selection process proceeds as follows. The five models are estimated and one-step-ahead forecasts are produced in the order shown in Table 2. As each model is estimated, the following three criteria are checked:

  • The mean absolute percent error (MAPE) for the last three years of the series must be less than 15%.

  • The significance probability for the Box-Ljung chi-square for up to lag 24 for monthly (8 for quarterly) must greater than 0.05.

  • The over-differencing criteria must not exceed 0.9.

The descriptions of these three criteria are given in the section Criteria Details. The default values for these criteria are those used by X11ARIMA/88 from Statistics Canada; these defaults can be changed by the MAPECR=, CHICR=, and OVDIFCR= options.

A model that fails any one of these three criteria is excluded from further consideration. In addition, if the ARIMA estimation fails for a given model, a warning is issued, and the model is excluded. The final set of all models considered consists of those that pass all three criteria and are estimated successfully. From this set, the model with the smallest MAPE for the last three years is chosen.

If all five models fail, ARIMA processing is skipped for the variable being processed, and the standard X-11 seasonal adjustment is performed. A note is written to the log with this information.

The chosen model is then used to forecast the series one or more years (determined by the FORECAST= option in the ARIMA statement). These forecasts are appended to the original data (or the prior and calendar-adjusted data).

If a BACKCAST= option is specified, the chosen model form is used, but the parameters are reestimated using the reversed series. Using these parameters, the reversed series is forecast for the number of years specified by the BACKCAST= option. These forecasts are then reversed and appended to the beginning of the original series, or the prior and calendar-adjusted series, to produce the backcasts.

Note that the final selection rule (the smallest MAPE using the last three years) emphasizes the quality of the forecasts at the end of the series. This is consistent with the purpose of the X-11-ARIMA methodology, which is to improve the estimates of seasonal factors and thus minimize revisions to recent past data as new data become available.

Criteria Details

Mean Absolute Percent Error (MAPE)

For the MAPE criteria testing, only the last three years of the original series (or prior and calendar adjusted series) are used in computing the MAPE.

Let y Subscript t, t equals 1 comma ellipsis comma n, be the last three years of the series, and denote its one-step-ahead forecast by ModifyingAbove y With caret Subscript t, where n equals 36 for a monthly series and n equals 12 for a quarterly series.

With this notation, the MAPE criteria are computed as

normal upper M normal upper A normal upper P normal upper E equals StartFraction 100 Over n EndFraction sigma-summation Underscript t equals 1 Overscript n Endscripts StartFraction StartAbsoluteValue y Subscript t Baseline minus ModifyingAbove y With caret Subscript t Baseline EndAbsoluteValue Over StartAbsoluteValue y Subscript t Baseline EndAbsoluteValue EndFraction
Box-Ljung Chi-Square

The Box-Ljung chi-square is a lack-of-fit test based on the model residuals. This test statistic is computed using the Ljung-Box formula

chi Subscript m Superscript 2 Baseline equals n left-parenthesis n plus 2 right-parenthesis sigma-summation Underscript k equals 1 Overscript m Endscripts StartFraction r Subscript k Superscript 2 Baseline Over left-parenthesis n minus k right-parenthesis EndFraction

where n is the number of residuals that can be computed for the time series, and

r Subscript k Baseline equals StartFraction sigma-summation Underscript t equals 1 Overscript n minus k Endscripts a Subscript t Baseline a Subscript t plus k Baseline Over sigma-summation Underscript t equals 1 Overscript n Endscripts a Subscript t Superscript 2 Baseline EndFraction

where the a Subscript t’s are the residual sequence. This formula has been suggested by Ljung and Box (1978) as yielding a better fit to the asymptotic chi-square distribution. Some simulation studies of the finite sample properties of this statistic are given by Davies, Triggs, and Newbold (1977) and by Ljung and Box (1978).

For monthly series, m equals 24, while for quarterly series, m equals 8.

Over-differencing Test

From Table 2 you can see that all models have a single seasonal MA factor and at most two nonseasonal MA factors. Also, all models have seasonal and nonseasonal differencing. Consider model 2 applied to a monthly series y Subscript t with upper E left-parenthesis y Subscript t Baseline right-parenthesis equals mu:

left-parenthesis 1 minus upper B Superscript 1 Baseline right-parenthesis left-parenthesis 1 minus upper B Superscript 12 Baseline right-parenthesis left-parenthesis y Subscript t Baseline minus mu right-parenthesis equals left-parenthesis 1 minus theta 1 upper B minus theta 2 upper B squared right-parenthesis left-parenthesis 1 minus theta 3 upper B Superscript 12 Baseline right-parenthesis a Subscript t

If theta 3 equals 1.0, then the factors left-parenthesis 1 minus theta 3 upper B Superscript 12 Baseline right-parenthesis and left-parenthesis 1 minus upper B Superscript 12 Baseline right-parenthesis will cancel, resulting in a lower-order model.

Similarly, if theta 1 plus theta 2 equals 1.0,

left-parenthesis 1 minus theta 1 upper B minus theta 2 upper B squared right-parenthesis equals left-parenthesis 1 minus upper B right-parenthesis left-parenthesis 1 minus alpha upper B right-parenthesis

for some alpha not-equals 0.0. Again, this results in cancellation and a lower-order model.

Since the parameters are not exact, it is not reasonable to require that

theta 3 less-than 1.0 normal a normal n normal d theta 1 plus theta 2 less-than 1.0

Instead, an approximate test is performed by requiring that

theta 3 less-than-or-equal-to 0.9 normal a normal n normal d theta 1 plus theta 2 less-than-or-equal-to 0.9

The default value of 0.9 can be changed by the OVDIFCR= option. Similar reasoning applies to the other models.

ARIMA Statement Options for the Five Predefined Models

Table 3 lists the five predefined models and gives the equivalent MODEL= parameters in a PROC X11 ARIMA statement.

In all models except the fifth, a log transformation is performed before the ARIMA estimation for the multiplicative case; no transformation is performed for the additive case. For the fifth model, no transformation is done for either case.

The multiplicative case is assumed in Table 3. The indicated seasonality s in the specification is either 12 (monthly) or 4 (quarterly). The MODEL statement assumes a monthly series.

Table 3: ARIMA Statements Options for Predefined Models

Model ARIMA Statement Options
(0,1,1)(0,1,1)s MODEL=( Q=1 SQ=1 DIF=1 SDIF=1 ) TRANSFORM=LOG
(0,1,2)(0,1,1)s MODEL=( Q=2 SQ=1 DIF=1 SDIF=1 ) TRANSFORM=LOG
(2,1,0)(0,1,1)s MODEL=( P=2 SQ=1 DIF=1 SDIF=1 ) TRANSFORM=LOG
(0,2,2)(0,1,1)s MODEL=( Q=2 SQ=1 DIF=2 SDIF=1 ) TRANSFORM=LOG
(2,1,2)(0,1,1)s MODEL=( P=2 Q=2 SQ=1 DIF=1 SDIF=1 )


Last updated: June 19, 2025