AUTOREG Procedure

Predicted Values

The AUTOREG procedure can produce two kinds of predicted values for the response series and corresponding residuals and confidence limits. The residuals in both cases are computed as the actual value minus the predicted value. In addition, when GARCH models are estimated, the AUTOREG procedure can output predictions of the conditional error variance.

Predicting the Unconditional Mean

The first type of predicted value is obtained from only the structural part of the model, bold x prime Subscript t Baseline bold b. These are useful in predicting values of new response time series, which are assumed to be described by the same model as the current response time series. The predicted values, residuals, standard errors, and upper and lower confidence limits for the structural predictions are requested by specifying the PREDICTEDM=, RESIDUALM=, STDERRM=, UCLM=, or LCLM= option in the OUTPUT statement. The ALPHACLM= option controls the confidence level for UCLM= and LCLM=. These confidence limits are for estimation of the mean of the dependent variable, bold x prime Subscript t Baseline bold b, where bold x Subscript t is the column vector of independent variables at observation t.

The predicted values are computed as

ModifyingAbove y With caret Subscript t Baseline equals bold x prime Subscript t Baseline bold b

and the upper and lower confidence limits as

ModifyingAbove u With caret Subscript t Baseline equals ModifyingAbove y With caret Subscript t Baseline plus t Subscript alpha slash 2 Baseline normal v
ModifyingAbove l With caret Subscript t Baseline equals ModifyingAbove y With caret Subscript t Baseline minus t Subscript alpha slash 2 Baseline normal v

where vsquared is an estimate of the variance of ModifyingAbove y With caret Subscript t and t Subscript alpha slash 2 is the upper alpha/2 percentage point of the t distribution.

normal upper P normal r normal o normal b left-parenthesis upper T greater-than t Subscript alpha slash 2 Baseline right-parenthesis equals alpha slash 2

where T is an observation from a t distribution with q degrees of freedom. The value of alpha can be set with the ALPHACLM= option. The degrees of freedom parameter, q, is taken to be the number of observations minus the number of free parameters in the final model. For the YW estimation method, the value of v is calculated as

normal v equals StartRoot s squared bold x prime Subscript t Baseline left-parenthesis bold upper X prime bold upper V Superscript negative 1 Baseline bold upper X right-parenthesis Superscript negative 1 Baseline bold x Subscript t Baseline EndRoot

where s squared is the error sum of squares divided by q. For the ULS and ML methods, it is calculated as

normal v equals StartRoot s squared bold x prime Subscript t Baseline bold upper W bold x Subscript t Baseline EndRoot

where bold upper W is the k times k submatrix of left-parenthesis bold upper J prime bold upper J right-parenthesis Superscript negative 1 that corresponds to the regression parameters. For more information, see the section Computational Methods.

Predicting Future Series Realizations

The other predicted values use both the structural part of the model and the predicted values of the error process. These conditional mean values are useful in predicting future values of the current response time series. The predicted values, residuals, standard errors, and upper and lower confidence limits for future observations conditional on past values are requested by the PREDICTED=, RESIDUAL=, STDERR=, UCL=, or LCL= option in the OUTPUT statement. The ALPHACLI= option controls the confidence level for UCL= and LCL=. These confidence limits are for the predicted value,

y overTilde Subscript t Baseline equals bold x prime Subscript t Baseline bold b plus nu Subscript t vertical-bar t minus 1

where bold x Subscript t is the vector of independent variables if all independent variables at time t are nonmissing, and nu Subscript t vertical-bar t minus 1 is the minimum variance linear predictor of the error term, which is defined in the following recursive way given the autoregressive model, AR(m) model, for nu Subscript t,

nu Subscript s vertical-bar t Baseline equals StartLayout Enlarged left-brace 1st Row 1st Column minus sigma-summation Underscript i equals 1 Overscript m Endscripts ModifyingAbove phi With caret Subscript i Baseline nu Subscript s minus i vertical-bar t Baseline 2nd Column s greater-than t or observation s is missing 2nd Row 1st Column y Subscript s Baseline minus bold x prime Subscript s Baseline bold b 2nd Column 0 less-than s less-than-or-equal-to t and observation s is nonmissing 3rd Row 1st Column 0 2nd Column s less-than-or-equal-to 0 EndLayout

where ModifyingAbove phi With caret Subscript i Baseline comma i equals 1 comma ellipsis comma m, are the estimated AR parameters. Observation s is considered to be missing if the dependent variable or at least one independent variable is missing. If some of the independent variables at time t are missing, the predicted y overTilde Subscript t is also missing. With the same definition of nu Subscript s vertical-bar t, the prediction method can be easily extended to the multistep forecast of y overTilde Subscript t plus d Baseline comma d greater-than 0:

y overTilde Subscript t plus d Baseline equals bold x prime Subscript t plus d Baseline bold b plus nu Subscript t plus d vertical-bar t minus 1

The prediction method is implemented through the Kalman filter.

If y overTilde Subscript t is not missing, the upper and lower confidence limits are computed as

u overTilde Subscript t Baseline equals y overTilde Subscript t Baseline plus t Subscript alpha slash 2 Baseline normal v
l overTilde Subscript t Baseline equals y overTilde Subscript t Baseline minus t Subscript alpha slash 2 Baseline normal v

where v, in this case, is computed as

normal v equals StartRoot bold z prime Subscript t Baseline bold upper V Subscript beta Baseline bold z Subscript t Baseline plus s squared r EndRoot

where bold upper V Subscript beta is the variance-covariance matrix of the estimation of regression parameter beta; bold z Subscript t is defined as

bold z Subscript t Baseline equals bold x Subscript t Baseline plus sigma-summation Underscript i equals 1 Overscript m Endscripts ModifyingAbove phi With caret Subscript i Baseline bold x Subscript t minus i vertical-bar t minus 1

and bold x Subscript s vertical-bar t is defined in a similar way as nu Subscript s vertical-bar t:

bold x Subscript s vertical-bar t Baseline equals StartLayout Enlarged left-brace 1st Row 1st Column minus sigma-summation Underscript i equals 1 Overscript m Endscripts ModifyingAbove phi With caret Subscript i Baseline bold x Subscript s minus i vertical-bar t Baseline 2nd Column s greater-than t or observation s is missing 2nd Row 1st Column bold x Subscript s Baseline 2nd Column 0 less-than s less-than-or-equal-to t and observation s is nonmissing 3rd Row 1st Column 0 2nd Column s less-than-or-equal-to 0 EndLayout

The formula for computing the prediction variance v is deducted based on Baillie (1979).

The value s squared r is the estimate of the conditional prediction error variance. At the start of the series, and after missing values, r is usually greater than 1. For the computational details of r, see the section Predicting the Conditional Variance. The plot of residuals and confidence limits in Example 8.4 illustrates this behavior.

Except to adjust the degrees of freedom for the error sum of squares, the preceding formulas do not account for the fact that the autoregressive parameters are estimated. In particular, the confidence limits are likely to be somewhat too narrow. In large samples, this is probably not an important effect, but it might be appreciable in small samples. For some discussion of this problem for AR(1) models, see Harvey (1981).

At the beginning of the series (the first m observations, where m is the value of the NLAG= option) and after missing values, these residuals do not match the residuals obtained by using OLS on the transformed variables. This is because, in these cases, the predicted noise values must be based on less than a complete set of past noise values and, thus, have larger variance. The GLS transformation for these observations includes a scale factor in addition to a linear combination of past values. Put another way, the bold upper L Superscript negative 1 matrix defined in the section Computational Methods has the value 1 along the diagonal, except for the first m observations and after missing values.

Predicting the Conditional Variance

The GARCH process can be written as

epsilon Subscript t Superscript 2 Baseline equals omega plus sigma-summation Underscript i equals 1 Overscript n Endscripts left-parenthesis alpha Subscript i Baseline plus gamma Subscript i Baseline right-parenthesis epsilon Subscript t minus i Superscript 2 Baseline minus sigma-summation Underscript j equals 1 Overscript p Endscripts gamma Subscript j Baseline eta Subscript t minus j Baseline plus eta Subscript t

where eta Subscript t Baseline equals epsilon Subscript t Superscript 2 Baseline minus h Subscript t and n equals max left-parenthesis p comma q right-parenthesis. This representation shows that the squared residual epsilon Subscript t Superscript 2 follows an ARMAleft-parenthesis n comma p right-parenthesis process. Then for any d greater-than 0, the conditional expectations are as follows:

bold upper E left-parenthesis epsilon Subscript t plus d Superscript 2 Baseline vertical-bar normal upper Psi Subscript t Baseline right-parenthesis equals omega plus sigma-summation Underscript i equals 1 Overscript n Endscripts left-parenthesis alpha Subscript i Baseline plus gamma Subscript i Baseline right-parenthesis bold upper E left-parenthesis epsilon Subscript t plus d minus i Superscript 2 Baseline vertical-bar normal upper Psi Subscript t Baseline right-parenthesis minus sigma-summation Underscript j equals 1 Overscript p Endscripts gamma Subscript j Baseline bold upper E left-parenthesis eta Subscript t plus d minus j Baseline vertical-bar normal upper Psi Subscript t Baseline right-parenthesis

The d-step-ahead prediction error, xi Subscript t plus d = y Subscript t plus d Baseline minus y Subscript t plus d vertical-bar t, has the conditional variance

bold upper V left-parenthesis xi Subscript t plus d Baseline vertical-bar normal upper Psi Subscript t Baseline right-parenthesis equals sigma-summation Underscript j equals 0 Overscript d minus 1 Endscripts g Subscript j Superscript 2 Baseline sigma Subscript t plus d minus j vertical-bar t Superscript 2

where

sigma Subscript t plus d minus j vertical-bar t Superscript 2 Baseline equals bold upper E left-parenthesis epsilon Subscript t plus d minus j Superscript 2 Baseline vertical-bar normal upper Psi Subscript t Baseline right-parenthesis

Coefficients in the conditional d-step prediction error variance are calculated recursively using the formula

g Subscript j Baseline equals minus phi 1 g Subscript j minus 1 Baseline minus midline-horizontal-ellipsis minus phi Subscript m Baseline g Subscript j minus m

where g 0 equals 1 and g Subscript j Baseline equals 0 if j less-than 0; phi 1, …, phi Subscript m are autoregressive parameters. Since the parameters are not known, the conditional variance is computed using the estimated autoregressive parameters. The d-step-ahead prediction error variance is simplified when there are no autoregressive terms:

bold upper V left-parenthesis xi Subscript t plus d Baseline vertical-bar normal upper Psi Subscript t Baseline right-parenthesis equals sigma Subscript t plus d vertical-bar t Superscript 2

Therefore, the one-step-ahead prediction error variance is equivalent to the conditional error variance defined in the GARCH process:

h Subscript t Baseline equals bold upper E left-parenthesis epsilon Subscript t Superscript 2 Baseline vertical-bar normal upper Psi Subscript t minus 1 Baseline right-parenthesis equals sigma Subscript t vertical-bar t minus 1 Superscript 2

The multistep forecast of conditional error variance of the EGARCH, QGARCH, TGARCH, PGARCH, and GARCH-M models cannot be calculated using the preceding formula for the GARCH model. The following formulas are recursively implemented to obtain the multistep forecast of conditional error variance of these models:

  • for the EGARCH(p, q) model:

    ln left-parenthesis sigma Subscript t plus d vertical-bar t Superscript 2 Baseline right-parenthesis equals omega plus sigma-summation Underscript i equals d Overscript q Endscripts alpha Subscript i Baseline g left-parenthesis z Subscript t plus d minus i Baseline right-parenthesis plus sigma-summation Underscript j equals 1 Overscript d minus 1 Endscripts gamma Subscript j Baseline ln left-parenthesis sigma Subscript t plus d minus j vertical-bar t Superscript 2 Baseline right-parenthesis plus sigma-summation Underscript j equals d Overscript p Endscripts gamma Subscript j Baseline ln left-parenthesis h Subscript t plus d minus j Baseline right-parenthesis

    where

    g left-parenthesis z Subscript t Baseline right-parenthesis equals theta z Subscript t Baseline plus StartAbsoluteValue z Subscript t Baseline EndAbsoluteValue minus upper E StartAbsoluteValue z Subscript t Baseline EndAbsoluteValue
    z Subscript t Baseline equals epsilon Subscript t Baseline slash StartRoot h Subscript t Baseline EndRoot
  • for the QGARCH(p, q) model:

    StartLayout 1st Row 1st Column sigma Subscript t plus d vertical-bar t Superscript 2 Baseline equals omega 2nd Column plus 3rd Column sigma-summation Underscript i equals 1 Overscript d minus 1 Endscripts alpha Subscript i Baseline left-parenthesis sigma Subscript t plus d minus i vertical-bar t Superscript 2 Baseline plus psi Subscript i Superscript 2 Baseline right-parenthesis plus sigma-summation Underscript i equals d Overscript q Endscripts alpha Subscript i Baseline left-parenthesis epsilon Subscript t plus d minus i Baseline minus psi Subscript i Baseline right-parenthesis squared 2nd Row 1st Column Blank 2nd Column plus 3rd Column sigma-summation Underscript j equals 1 Overscript d minus 1 Endscripts gamma Subscript j Baseline sigma Subscript t plus d minus j vertical-bar t Superscript 2 plus sigma-summation Underscript j equals d Overscript p Endscripts gamma Subscript j Baseline h Subscript t plus d minus j EndLayout
  • for the TGARCH(p, q) model:

    StartLayout 1st Row 1st Column sigma Subscript t plus d vertical-bar t Superscript 2 Baseline equals omega 2nd Column plus 3rd Column sigma-summation Underscript i equals 1 Overscript d minus 1 Endscripts left-parenthesis alpha Subscript i Baseline plus psi Subscript i Baseline slash 2 right-parenthesis sigma Subscript t plus d minus i vertical-bar t Superscript 2 plus sigma-summation Underscript i equals d Overscript q Endscripts left-parenthesis alpha Subscript i Baseline plus 1 Subscript epsilon Sub Subscript t plus d minus i Subscript less-than 0 Baseline psi Subscript i Baseline right-parenthesis epsilon Subscript t plus d minus i Superscript 2 2nd Row 1st Column Blank 2nd Column plus 3rd Column sigma-summation Underscript j equals 1 Overscript d minus 1 Endscripts gamma Subscript j Baseline sigma Subscript t plus d minus j vertical-bar t Superscript 2 plus sigma-summation Underscript j equals d Overscript p Endscripts gamma Subscript j Baseline h Subscript t plus d minus j EndLayout
  • for the PGARCH(p, q) model:

    StartLayout 1st Row 1st Column left-parenthesis sigma Subscript t plus d vertical-bar t Superscript 2 Baseline right-parenthesis Superscript lamda Baseline equals omega 2nd Column plus 3rd Column sigma-summation Underscript i equals 1 Overscript d minus 1 Endscripts alpha Subscript i Baseline left-parenthesis left-parenthesis 1 plus psi Subscript i Baseline right-parenthesis Superscript 2 lamda Baseline plus left-parenthesis 1 minus psi Subscript i Baseline right-parenthesis Superscript 2 lamda Baseline right-parenthesis left-parenthesis sigma Subscript t plus d minus i vertical-bar t Superscript 2 Baseline right-parenthesis Superscript lamda slash 2 2nd Row 1st Column Blank 2nd Column plus 3rd Column sigma-summation Underscript i equals d Overscript q Endscripts alpha Subscript i Baseline left-parenthesis StartAbsoluteValue epsilon Subscript t plus d minus i Baseline EndAbsoluteValue minus psi Subscript i Baseline epsilon Subscript t plus d minus i Baseline right-parenthesis Superscript 2 lamda 3rd Row 1st Column Blank 2nd Column plus 3rd Column sigma-summation Underscript j equals 1 Overscript d minus 1 Endscripts gamma Subscript j Baseline left-parenthesis sigma Subscript t plus d minus j vertical-bar t Superscript 2 Baseline right-parenthesis Superscript lamda plus sigma-summation Underscript j equals d Overscript p Endscripts gamma Subscript j Baseline h Subscript t plus d minus j Superscript lamda EndLayout
  • for the GARCH-M model: ignoring the mean effect and directly using the formula of the corresponding GARCH model.

If the conditional error variance is homoscedastic, the conditional prediction error variance is identical to the unconditional prediction error variance

bold upper V left-parenthesis xi Subscript t plus d Baseline vertical-bar normal upper Psi Subscript t Baseline right-parenthesis equals bold upper V left-parenthesis xi Subscript t plus d Baseline right-parenthesis equals sigma squared sigma-summation Underscript j equals 0 Overscript d minus 1 Endscripts g Subscript j Superscript 2

since sigma Subscript t plus d minus j vertical-bar t Superscript 2 Baseline equals sigma squared. You can compute s squared r (which is the second term of the variance for the predicted value y overTilde Subscript t explained in the section Predicting Future Series Realizations) by using the formula sigma squared sigma-summation Underscript j equals 0 Overscript d minus 1 Endscripts g Subscript j Superscript 2, and r is estimated from sigma-summation Underscript j equals 0 Overscript d minus 1 Endscripts g Subscript j Superscript 2 by using the estimated autoregressive parameters.

Consider the following conditional prediction error variance:

bold upper V left-parenthesis xi Subscript t plus d Baseline vertical-bar normal upper Psi Subscript t Baseline right-parenthesis equals sigma squared sigma-summation Underscript j equals 0 Overscript d minus 1 Endscripts g Subscript j Superscript 2 Baseline plus sigma-summation Underscript j equals 0 Overscript d minus 1 Endscripts g Subscript j Superscript 2 Baseline left-parenthesis sigma Subscript t plus d minus j vertical-bar t Superscript 2 Baseline minus sigma squared right-parenthesis

The second term in the preceding equation can be interpreted as the noise from using the homoscedastic conditional variance when the errors follow the GARCH process. However, it is expected that if the GARCH process is covariance stationary, the difference between the conditional prediction error variance and the unconditional prediction error variance disappears as the forecast horizon d increases.

Last updated: June 19, 2025