AUTOREG Procedure

Heteroscedasticity- and Autocorrelation-Consistent Covariance Matrix Estimator

The heteroscedasticity-consistent covariance matrix estimator (HCCME), also known as the sandwich (or robust or empirical) covariance matrix estimator, has been popular in recent years because it gives the consistent estimation of the covariance matrix of the parameter estimates even when the heteroscedasticity structure might be unknown or misspecified. White (1980) proposes the concept of HCCME, known as HC0. However, the small-sample performance of HC0 is not good in some cases. Davidson and MacKinnon (1993) introduce more improvements to HC0, namely HC1, HC2 and HC3, with the degrees-of-freedom or leverage adjustment. Cribari-Neto (2004) proposes HC4 for cases that have points of high leverage.

HCCME can be expressed in the following general "sandwich" form,

normal upper Sigma equals upper B Superscript negative 1 Baseline upper M upper B Superscript negative 1

where B, which stands for "bread," is the Hessian matrix and M, which stands for "meat," is the outer product of gradient (OPG) with or without adjustment. For HC0, M is the OPG without adjustment; that is,

upper M Subscript normal upper H normal upper C Baseline 0 Baseline equals sigma-summation Underscript t equals 1 Overscript upper T Endscripts g Subscript t Baseline g prime Subscript t

where T is the sample size and is the gradient vector of tth observation. For HC1, M is the OPG with the degrees-of-freedom correction; that is,

upper M Subscript normal upper H normal upper C Baseline 1 Baseline equals StartFraction upper T Over upper T minus k EndFraction sigma-summation Underscript t equals 1 Overscript upper T Endscripts g Subscript t Baseline g prime Subscript t

where k is the number of parameters. For HC2, HC3, and HC4, the adjustment is related to leverage, namely,

StartLayout 1st Row 1st Column upper M Subscript normal upper H normal upper C Baseline 2 2nd Column equals sigma-summation Underscript t equals 1 Overscript upper T Endscripts StartFraction g Subscript t Baseline g prime Subscript t Over 1 minus h Subscript t t Baseline EndFraction 2nd Row 1st Column upper M Subscript normal upper H normal upper C Baseline 3 2nd Column equals sigma-summation Underscript t equals 1 Overscript upper T Endscripts StartFraction g Subscript t Baseline g prime Subscript t Over left-parenthesis 1 minus h Subscript t t Baseline right-parenthesis squared EndFraction 3rd Row 1st Column upper M Subscript normal upper H normal upper C Baseline 4 2nd Column equals sigma-summation Underscript t equals 1 Overscript upper T Endscripts StartFraction g Subscript t Baseline g prime Subscript t Over left-parenthesis 1 minus h Subscript t t Baseline right-parenthesis Superscript min left-parenthesis 4 comma upper T h Super Subscript t t Superscript slash k right-parenthesis Baseline EndFraction EndLayout

The leverage is defined as , where is defined as follows:

For an OLS model, is the tth observed regressors in column vector form.
For an AR error model, is the derivative vector of the tth residual with respect to the parameters.
For a GARCH or heteroscedasticity model, is the gradient of the tth observation (that is, ).

The heteroscedasticity- and autocorrelation-consistent (HAC) covariance matrix estimator can also be expressed in "sandwich" form,

where B is still the Hessian matrix, but M is the kernel estimator in the following form:

upper M Subscript normal upper H normal upper A normal upper C Baseline equals a left-parenthesis sigma-summation Underscript t equals 1 Overscript upper T Endscripts g Subscript t Baseline g prime Subscript t plus sigma-summation Underscript j equals 1 Overscript upper T minus 1 Endscripts k left-parenthesis StartFraction j Over b EndFraction right-parenthesis sigma-summation Underscript t equals 1 Overscript upper T minus j Endscripts left-parenthesis g Subscript t Baseline g prime Subscript t plus j plus g Subscript t plus j Baseline g prime Subscript t right-parenthesis right-parenthesis

where T is the sample size, is the gradient vector of tth observation, is the real-valued kernel function, b is the bandwidth parameter, and a is the adjustment factor of small-sample degrees of freedom (that is, if ADJUSTDF option is not specified and otherwise , where k is the number of parameters). The types of kernel functions are listed in Table 2.

Table 2: Kernel Functions

Kernel Name	Equation
Bartlett
Parzen
Quadratic spectral
Truncated
Tukey-Hanning

When you specify BANDWIDTH=ANDREWS91, according to Andrews (1991) the bandwidth parameter is estimated as shown in Table 3.

Table 3: Bandwidth Parameter Estimation

Kernel Name	Bandwidth Parameter
Bartlett
Parzen
Quadratic spectral
Truncated
Tukey-Hanning

Let denote each series in , and let denote the corresponding estimates of the autoregressive and innovation variance parameters of the AR(1) model on , , where the AR(1) model is parameterized as with . The factors and are estimated with the formulas

StartLayout 1st Row 1st Column alpha left-parenthesis 1 right-parenthesis 2nd Column equals StartStartFraction sigma-summation Underscript a equals 1 Overscript k Endscripts StartFraction 4 rho Subscript a Superscript 2 Baseline sigma Subscript a Superscript 4 Baseline Over left-parenthesis 1 minus rho Subscript a Baseline right-parenthesis Superscript 6 Baseline left-parenthesis 1 plus rho Subscript a Baseline right-parenthesis squared EndFraction OverOver sigma-summation Underscript a equals 1 Overscript k Endscripts StartFraction sigma Subscript a Superscript 4 Baseline Over left-parenthesis 1 minus rho Subscript a Baseline right-parenthesis Superscript 4 Baseline EndFraction EndEndFraction 2nd Row 1st Column alpha left-parenthesis 2 right-parenthesis 2nd Column equals StartStartFraction sigma-summation Underscript a equals 1 Overscript k Endscripts StartFraction 4 rho Subscript a Superscript 2 Baseline sigma Subscript a Superscript 4 Baseline Over left-parenthesis 1 minus rho Subscript a Baseline right-parenthesis Superscript 8 Baseline EndFraction OverOver sigma-summation Underscript a equals 1 Overscript k Endscripts StartFraction sigma Subscript a Superscript 4 Baseline Over left-parenthesis 1 minus rho Subscript a Baseline right-parenthesis Superscript 4 Baseline EndFraction EndEndFraction EndLayout

When you specify BANDWIDTH=NEWEYWEST94, according to Newey and West (1994) the bandwidth parameter is estimated as shown in Table 4.

Table 4: Bandwidth Parameter Estimation

Kernel Name	Bandwidth Parameter
Bartlett
Parzen
Quadratic spectral
Truncated
Tukey-Hanning

The factors and are estimated with the following formulas:

StartLayout 1st Row 1st Column s 1 2nd Column equals 2 sigma-summation Underscript j equals 1 Overscript n Endscripts j sigma Subscript j Baseline 2nd Row 1st Column s 0 2nd Column equals sigma 0 plus 2 sigma-summation Underscript j equals 1 Overscript n Endscripts sigma Subscript j EndLayout

where n is the lag selection parameter and is determined by kernels, as listed in Table 5.

Table 5: Lag Selection Parameter Estimation

Kernel Name	Lag Selection Parameter
Bartlett
Parzen
Quadratic spectral
Truncated
Tukey-Hanning

The factor c in Table 5 is specified by the C= option. By default, it is 12.

The factor is estimated with the equation

sigma Subscript j Baseline equals upper T Superscript negative 1 Baseline sigma-summation Underscript t equals j plus 1 Overscript upper T Endscripts left-parenthesis sigma-summation Underscript a equals i Overscript k Endscripts g Subscript a t Baseline sigma-summation Underscript a equals i Overscript k Endscripts g Subscript a t minus j Baseline right-parenthesis comma j equals 0 comma ellipsis comma n

where i is 1 if the NOINT option in the MODEL statement is specified (otherwise, it is 2), and is the same as in the Andrews method.

If you specify BANDWIDTH=SAMPLESIZE, the bandwidth parameter is estimated with the equation

b equals StartLayout Enlarged left-brace 1st Row 1st Column left floor gamma upper T Superscript r Baseline plus c right floor 2nd Column if BANDWIDTH equals SAMPLESIZE left-parenthesis INT right-parenthesis option is specified 2nd Row 1st Column gamma upper T Superscript r Baseline plus c 2nd Column otherwise EndLayout

where T is the sample size; is the largest integer less than or equal to x; and , r, and c are values specified by the BANDWIDTH=SAMPLESIZE(GAMMA=, RATE=, CONSTANT=) options, respectively.

If you specify the PREWHITENING option, is prewhitened by the VAR(1) model,

g Subscript t Baseline equals upper A g Subscript t minus 1 Baseline plus w Subscript t

Then M is calculated by

upper M Subscript normal upper H normal upper A normal upper C Baseline equals a left-parenthesis upper I minus upper A right-parenthesis Superscript negative 1 Baseline left-parenthesis sigma-summation Underscript t equals 1 Overscript upper T Endscripts w Subscript t Baseline w prime Subscript t plus sigma-summation Underscript j equals 1 Overscript upper T minus 1 Endscripts k left-parenthesis StartFraction j Over b EndFraction right-parenthesis sigma-summation Underscript t equals 1 Overscript upper T minus j Endscripts left-parenthesis w Subscript t Baseline w prime Subscript t plus j plus w Subscript t plus j Baseline w prime Subscript t right-parenthesis right-parenthesis left-parenthesis left-parenthesis upper I minus upper A right-parenthesis Superscript negative 1 Baseline right-parenthesis prime

The bandwidth calculation is also based on the prewhitened series .

Last updated: June 19, 2025