AUTOREG Procedure

Autoregressive Error Model

The regression model with autocorrelated disturbances is as follows:

y Subscript t Baseline equals bold x prime Subscript t Baseline beta plus nu Subscript t
nu Subscript t Baseline equals epsilon Subscript t Baseline minus phi 1 nu Subscript t minus 1 Baseline minus midline-horizontal-ellipsis minus phi Subscript m Baseline nu Subscript t minus m
epsilon Subscript t Baseline normal upper N left-parenthesis 0 comma sigma squared right-parenthesis

In these equations, y Subscript t are the dependent values, bold x Subscript t is a column vector of regressor variables, beta is a column vector of structural parameters, and epsilon Subscript t is normally and independently distributed with a mean of 0 and a variance of sigma squared. Note that in this parameterization, the signs of the autoregressive parameters are reversed from the parameterization documented in most of the literature.

PROC AUTOREG offers four estimation methods for the autoregressive error model. The default method, Yule-Walker (YW) estimation, is the fastest computationally. The Yule-Walker method used by PROC AUTOREG is described in Gallant and Goebel (1976). Harvey (1981) calls this method the two-step full transform method. The other methods are iterated YW, unconditional least squares (ULS), and maximum likelihood (ML). The ULS method is also referred to as nonlinear least squares (NLS) or exact least squares (ELS).

You can use all of the methods with data containing missing values, but you should use ML estimation if the missing values are plentiful. For further discussion of the advantages of different methods, see the section Alternative Autocorrelation Correction Methods, later in this chapter.

The Yule-Walker Method

Let bold-italic phi represent the vector of autoregressive parameters,

bold-italic phi equals left-parenthesis phi 1 comma phi 2 comma ellipsis comma phi Subscript m Baseline right-parenthesis prime

and let the variance matrix of the error vector bold-italic nu equals left-parenthesis nu 1 comma ellipsis comma nu Subscript upper N Baseline right-parenthesis prime be bold upper Sigma,

upper E left-parenthesis nu nu prime right-parenthesis equals bold upper Sigma equals sigma squared bold upper V

If the vector of autoregressive parameters bold-italic phi is known, the matrix bold upper V can be computed from the autoregressive parameters. bold upper Sigma is then sigma squared bold upper V. Given bold upper Sigma, the efficient estimates of regression parameters beta can be computed using generalized least squares (GLS). The GLS estimates then yield the unbiased estimate of the variance sigma squared,

The Yule-Walker method alternates estimation of beta using generalized least squares with estimation of bold-italic phi using the Yule-Walker equations applied to the sample autocorrelation function. The YW method starts by forming the OLS estimate of beta. Next, bold-italic phi is estimated from the sample autocorrelation function of the OLS residuals by using the Yule-Walker equations. Then bold upper V is estimated from the estimate of bold-italic phi, and bold upper Sigma is estimated from bold upper V and the OLS estimate of sigma squared. The autocorrelation corrected estimates of the regression parameters beta are then computed by GLS, using the estimated bold upper Sigma matrix. These are the Yule-Walker estimates.

If the ITER option is specified, the Yule-Walker residuals are used to form a new sample autocorrelation function, the new autocorrelation function is used to form a new estimate of bold-italic phi and bold upper V, and the GLS estimates are recomputed using the new variance matrix. This alternation of estimates continues until either the maximum change in the ModifyingAbove phi With caret estimate between iterations is less than the value specified by the CONVERGE= option or the maximum number of allowed iterations is reached. This produces the iterated Yule-Walker estimates. Iteration of the estimates may not yield much improvement.

The Yule-Walker equations, solved to obtain ModifyingAbove phi With caret and a preliminary estimate of sigma squared, are

bold upper R ModifyingAbove phi With caret equals negative bold r

Here bold r equals left-parenthesis r 1 comma ellipsis comma r Subscript m Baseline right-parenthesis prime, where r Subscript i is the lag i sample autocorrelation. The matrix bold upper R is the Toeplitz matrix whose i,jth element is r Subscript StartAbsoluteValue i minus j EndAbsoluteValue. If you specify a subset model, then only the rows and columns of bold upper R and bold r corresponding to the subset of lags specified are used.

If the BACKSTEP option is specified, for purposes of significance testing, the matrix left-bracket bold upper R bold r right-bracket is treated as a sum-of-squares-and-crossproducts matrix arising from a simple regression with upper N minus k observations, where k is the number of estimated parameters.

The Unconditional Least Squares and Maximum Likelihood Methods

Define the transformed error, bold e equals left-parenthesis epsilon 1 comma ellipsis comma epsilon Subscript upper T Baseline right-parenthesis, as

bold e equals bold upper L Superscript negative 1 Baseline bold n

where bold n equals bold y minus bold upper X beta and bold upper L is the Cholesky root of bold upper V—that is, bold upper V equals bold upper L bold upper L prime with bold upper L lower triangular..

The unconditional sum of squares for the model, S, is

upper S equals bold n prime bold upper V Superscript negative 1 Baseline bold n equals bold e prime bold e

The ULS estimates are computed by minimizing upper S with respect to the parameters beta and phi Subscript i.

The full log-likelihood function for the autoregressive error model is

l equals minus StartFraction upper N Over 2 EndFraction ln left-parenthesis 2 pi right-parenthesis minus StartFraction upper N Over 2 EndFraction ln left-parenthesis sigma squared right-parenthesis minus one-half ln left-parenthesis StartAbsoluteValue bold upper V EndAbsoluteValue right-parenthesis minus StartFraction upper S Over 2 sigma squared EndFraction

where StartAbsoluteValue bold upper V EndAbsoluteValue denotes determinant of bold upper V. For the ML method, the likelihood function is maximized by minimizing an equivalent sum-of-squares function.

Maximizing l with respect to sigma squared (and concentrating sigma squared out of the likelihood) and dropping the constant term minus StartFraction upper N Over 2 EndFraction ln left-parenthesis 2 pi right-parenthesis plus 1 minus ln left-parenthesis upper N right-parenthesis produces the concentrated log-likelihood function

l Subscript c Baseline equals minus StartFraction upper N Over 2 EndFraction ln left-parenthesis upper S StartAbsoluteValue bold upper V EndAbsoluteValue Superscript 1 slash upper N Baseline right-parenthesis

Rewriting the variable term within the logarithm gives

upper S Subscript m l Baseline equals StartAbsoluteValue bold upper L EndAbsoluteValue Superscript 1 slash upper N Baseline bold e prime bold e StartAbsoluteValue bold upper L EndAbsoluteValue Superscript 1 slash upper N

PROC AUTOREG computes the ML estimates by minimizing the objective function upper S Subscript m l Baseline equals StartAbsoluteValue bold upper L EndAbsoluteValue Superscript 1 slash upper N Baseline bold e prime bold e StartAbsoluteValue bold upper L EndAbsoluteValue Superscript 1 slash upper N.

The maximum likelihood estimates may not exist for some data sets (Anderson and Mentz 1980). This is the case for very regular data sets, such as an exact linear trend.

Computational Methods

Sample Autocorrelation Function

The sample autocorrelation function is computed from the structural residuals or noise bold n Subscript bold t Baseline equals y Subscript t Baseline minus bold x prime Subscript t Baseline bold b, where bold b is the current estimate of beta. The sample autocorrelation function is the sum of all available lagged products of bold n Subscript t of order j divided by script l plus j, where script l is the number of such products.

If there are no missing values, then script l plus j equals upper N, the number of observations. In this case, the Toeplitz matrix of autocorrelations, bold upper R, is at least positive semidefinite. If there are missing values, these autocorrelation estimates of r can yield an bold upper R matrix that is not positive semidefinite. If such estimates occur, a warning message is printed, and the estimates are tapered by exponentially declining weights until bold upper R is positive definite.

Data Transformation and the Kalman Filter

The calculation of bold upper V from bold-italic phi for the general ARleft-parenthesis m right-parenthesis model is complicated, and the size of bold upper V depends on the number of observations. Instead of actually calculating bold upper V and performing GLS in the usual way, in practice a Kalman filter algorithm is used to transform the data and compute the GLS results through a recursive process.

In all of the estimation methods, the original data are transformed by the inverse of the Cholesky root of bold upper V. Let bold upper L denote the Cholesky root of bold upper V—that is, bold upper V equals bold upper L bold upper L prime with bold upper L lower triangular. For an ARleft-parenthesis m right-parenthesis model, bold upper L Superscript negative 1 is a band diagonal matrix with m anomalous rows at the beginning and the autoregressive parameters along the remaining rows. Thus, if there are no missing values, after the first m minus 1 observations the data are transformed as

z Subscript t Baseline equals x Subscript t Baseline plus ModifyingAbove phi With caret Subscript 1 Baseline x Subscript t minus 1 Baseline plus midline-horizontal-ellipsis plus ModifyingAbove phi With caret Subscript m Baseline x Subscript t minus m

The transformation is carried out using a Kalman filter, and the lower triangular matrix bold upper L is never directly computed. The Kalman filter algorithm, as it applies here, is described in Harvey and Phillips (1979) and Jones (1980). Although bold upper L is not computed explicitly, for ease of presentation the remaining discussion is in terms of bold upper L. If there are missing values, then the submatrix of bold upper L consisting of the rows and columns with nonmissing values is used to generate the transformations.

Gauss-Newton Algorithms

The ULS and ML estimates employ a Gauss-Newton algorithm to minimize the sum of squares and maximize the log likelihood, respectively. The relevant optimization is performed simultaneously for both the regression and AR parameters. The OLS estimates of beta and the Yule-Walker estimates of bold-italic phi are used as starting values for these methods.

The Gauss-Newton algorithm requires the derivatives of bold e or StartAbsoluteValue bold upper L EndAbsoluteValue Superscript 1 slash upper N Baseline bold e with respect to the parameters. The derivatives with respect to the parameter vector beta are

StartFraction partial-differential bold e Over partial-differential beta prime EndFraction equals minus bold upper L Superscript negative 1 Baseline bold upper X
StartFraction partial-differential StartAbsoluteValue bold upper L EndAbsoluteValue Superscript 1 slash upper N Baseline bold e Over partial-differential bold-italic beta prime EndFraction equals minus StartAbsoluteValue bold upper L EndAbsoluteValue Superscript 1 slash upper N Baseline bold upper L Superscript negative 1 Baseline bold upper X

These derivatives are computed by the transformation described previously. The derivatives with respect to bold-italic phi are computed by differentiating the Kalman filter recurrences and the equations for the initial conditions.

Variance Estimates and Standard Errors

For the Yule-Walker method, the estimate of the error variance, s squared, is the error sum of squares from the last application of GLS, divided by the error degrees of freedom (number of observations N minus the number of free parameters).

The variance-covariance matrix for the components of bold b is taken as s squared left-parenthesis bold upper X prime bold upper V Superscript negative 1 Baseline bold upper X right-parenthesis Superscript negative 1 for the Yule-Walker method. For the ULS and ML methods, the variance-covariance matrix of the parameter estimates is computed as s squared left-parenthesis bold upper J prime bold upper J right-parenthesis Superscript negative 1. For the ULS method, bold upper J is the matrix of derivatives of bold e with respect to the parameters. For the ML method, bold upper J is the matrix of derivatives of StartAbsoluteValue bold upper L EndAbsoluteValue Superscript 1 slash upper N Baseline bold e divided by StartAbsoluteValue bold upper L EndAbsoluteValue Superscript 1 slash upper N. The estimate of the variance-covariance matrix of bold b assuming that bold-italic phi is known is s squared left-parenthesis bold upper X prime bold upper V Superscript negative 1 Baseline bold upper X right-parenthesis Superscript negative 1. For OLS model, the estimate of the variance-covariance matrix is s squared left-parenthesis bold upper X prime bold upper X right-parenthesis Superscript negative 1.

Park and Mitchell (1980) investigated the small sample performance of the standard error estimates obtained from some of these methods. In particular, simulating an AR(1) model for the noise term, they found that the standard errors calculated using GLS with an estimated autoregressive parameter underestimated the true standard errors. These estimates of standard errors are the ones calculated by PROC AUTOREG with the Yule-Walker method.

The estimates of the standard errors calculated with the ULS or ML method take into account the joint estimation of the AR and the regression parameters and may give more accurate standard-error values than the YW method. At the same values of the autoregressive parameters, the ULS and ML standard errors will always be larger than those computed from Yule-Walker. However, simulations of the models used by Park and Mitchell (1980) suggest that the ULS and ML standard error estimates can also be underestimates. Caution is advised, especially when the estimated autocorrelation is high and the sample size is small.

High autocorrelation in the residuals is a symptom of lack of fit. An autoregressive error model should not be used as a nostrum for models that simply do not fit. It is often the case that time series variables tend to move as a random walk. This means that an AR(1) process with a parameter near one absorbs a great deal of the variation. See Example 8.3, which fits a linear trend to a sine wave.

For ULS or ML estimation, the joint variance-covariance matrix of all the regression and autoregression parameters is computed. For the Yule-Walker method, the variance-covariance matrix is computed only for the regression parameters.

Lagged Dependent Variables

The Yule-Walker estimation method is not directly appropriate for estimating models that include lagged dependent variables among the regressors. Therefore, the maximum likelihood method is the default when the LAGDEP or LAGDEP= option is specified in the MODEL statement. However, when lagged dependent variables are used, the maximum likelihood estimator is not exact maximum likelihood but is conditional on the first few values of the dependent variable.

Last updated: June 19, 2025