AUTOREG Procedure

Heteroscedasticity- and Autocorrelation-Consistent Covariance Matrix Estimator

The heteroscedasticity-consistent covariance matrix estimator (HCCME), also known as the sandwich (or robust or empirical) covariance matrix estimator, has been popular in recent years because it gives the consistent estimation of the covariance matrix of the parameter estimates even when the heteroscedasticity structure might be unknown or misspecified. White (1980) proposes the concept of HCCME, known as HC0. However, the small-sample performance of HC0 is not good in some cases. Davidson and MacKinnon (1993) introduce more improvements to HC0, namely HC1, HC2 and HC3, with the degrees-of-freedom or leverage adjustment. Cribari-Neto (2004) proposes HC4 for cases that have points of high leverage.

HCCME can be expressed in the following general "sandwich" form,

normal upper Sigma equals upper B Superscript negative 1 Baseline upper M upper B Superscript negative 1

where B, which stands for "bread," is the Hessian matrix and M, which stands for "meat," is the outer product of gradient (OPG) with or without adjustment. For HC0, M is the OPG without adjustment; that is,

upper M Subscript normal upper H normal upper C Baseline 0 Baseline equals sigma-summation Underscript t equals 1 Overscript upper T Endscripts g Subscript t Baseline g prime Subscript t

where T is the sample size and g Subscript t is the gradient vector of tth observation. For HC1, M is the OPG with the degrees-of-freedom correction; that is,

upper M Subscript normal upper H normal upper C Baseline 1 Baseline equals StartFraction upper T Over upper T minus k EndFraction sigma-summation Underscript t equals 1 Overscript upper T Endscripts g Subscript t Baseline g prime Subscript t

where k is the number of parameters. For HC2, HC3, and HC4, the adjustment is related to leverage, namely,

StartLayout 1st Row 1st Column upper M Subscript normal upper H normal upper C Baseline 2 2nd Column equals sigma-summation Underscript t equals 1 Overscript upper T Endscripts StartFraction g Subscript t Baseline g prime Subscript t Over 1 minus h Subscript t t Baseline EndFraction 2nd Row 1st Column upper M Subscript normal upper H normal upper C Baseline 3 2nd Column equals sigma-summation Underscript t equals 1 Overscript upper T Endscripts StartFraction g Subscript t Baseline g prime Subscript t Over left-parenthesis 1 minus h Subscript t t Baseline right-parenthesis squared EndFraction 3rd Row 1st Column upper M Subscript normal upper H normal upper C Baseline 4 2nd Column equals sigma-summation Underscript t equals 1 Overscript upper T Endscripts StartFraction g Subscript t Baseline g prime Subscript t Over left-parenthesis 1 minus h Subscript t t Baseline right-parenthesis Superscript min left-parenthesis 4 comma upper T h Super Subscript t t Superscript slash k right-parenthesis Baseline EndFraction EndLayout

The leverage h Subscript t t is defined as h Subscript t t Baseline identical-to j prime Subscript t Baseline left-parenthesis sigma-summation Underscript t equals 1 Overscript upper T Endscripts j Subscript t Baseline j prime Subscript t right-parenthesis Superscript negative 1 Baseline j Subscript t, where j Subscript t is defined as follows:

  • For an OLS model, j Subscript t is the tth observed regressors in column vector form.

  • For an AR error model, j Subscript t is the derivative vector of the tth residual with respect to the parameters.

  • For a GARCH or heteroscedasticity model, j Subscript t is the gradient of the tth observation (that is, g Subscript t).

The heteroscedasticity- and autocorrelation-consistent (HAC) covariance matrix estimator can also be expressed in "sandwich" form,

normal upper Sigma equals upper B Superscript negative 1 Baseline upper M upper B Superscript negative 1

where B is still the Hessian matrix, but M is the kernel estimator in the following form:

upper M Subscript normal upper H normal upper A normal upper C Baseline equals a left-parenthesis sigma-summation Underscript t equals 1 Overscript upper T Endscripts g Subscript t Baseline g prime Subscript t plus sigma-summation Underscript j equals 1 Overscript upper T minus 1 Endscripts k left-parenthesis StartFraction j Over b EndFraction right-parenthesis sigma-summation Underscript t equals 1 Overscript upper T minus j Endscripts left-parenthesis g Subscript t Baseline g prime Subscript t plus j plus g Subscript t plus j Baseline g prime Subscript t right-parenthesis right-parenthesis

where T is the sample size, g Subscript t is the gradient vector of tth observation, k left-parenthesis period right-parenthesis is the real-valued kernel function, b is the bandwidth parameter, and a is the adjustment factor of small-sample degrees of freedom (that is, a equals 1 if ADJUSTDF option is not specified and otherwise a equals upper T slash left-parenthesis upper T minus k right-parenthesis, where k is the number of parameters). The types of kernel functions are listed in Table 2.

Table 2: Kernel Functions

Kernel Name Equation
Bartlett k left-parenthesis x right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column 1 minus StartAbsoluteValue x EndAbsoluteValue 2nd Column StartAbsoluteValue x EndAbsoluteValue less-than-or-equal-to 1 2nd Row 1st Column 0 2nd Column otherwise EndLayout
Parzen k left-parenthesis x right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column 1 minus 6 x squared plus 6 StartAbsoluteValue x EndAbsoluteValue cubed 2nd Column 0 less-than-or-equal-to StartAbsoluteValue x EndAbsoluteValue less-than-or-equal-to 1 slash 2 2nd Row 1st Column 2 left-parenthesis 1 minus StartAbsoluteValue x EndAbsoluteValue right-parenthesis cubed 2nd Column 1 slash 2 less-than-or-equal-to StartAbsoluteValue x EndAbsoluteValue less-than-or-equal-to 1 3rd Row 1st Column 0 2nd Column otherwise EndLayout
Quadratic spectral k left-parenthesis x right-parenthesis equals StartFraction 25 Over 12 pi squared x squared EndFraction left-parenthesis StartFraction sine left-parenthesis 6 pi x slash 5 right-parenthesis Over 6 pi x slash 5 EndFraction minus cosine left-parenthesis 6 pi x slash 5 right-parenthesis right-parenthesis
Truncated k left-parenthesis x right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column 1 2nd Column StartAbsoluteValue x EndAbsoluteValue less-than-or-equal-to 1 2nd Row 1st Column 0 2nd Column otherwise EndLayout
Tukey-Hanning k left-parenthesis x right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column left-parenthesis 1 plus cosine left-parenthesis pi x right-parenthesis right-parenthesis slash 2 2nd Column StartAbsoluteValue x EndAbsoluteValue less-than-or-equal-to 1 2nd Row 1st Column 0 2nd Column otherwise EndLayout


When you specify BANDWIDTH=ANDREWS91, according to Andrews (1991) the bandwidth parameter is estimated as shown in Table 3.

Table 3: Bandwidth Parameter Estimation

Kernel Name Bandwidth Parameter
Bartlett b equals 1.1447 left-parenthesis alpha left-parenthesis 1 right-parenthesis upper T right-parenthesis Superscript 1 slash 3
Parzen b equals 2.6614 left-parenthesis alpha left-parenthesis 2 right-parenthesis upper T right-parenthesis Superscript 1 slash 5
Quadratic spectral b equals 1.3221 left-parenthesis alpha left-parenthesis 2 right-parenthesis upper T right-parenthesis Superscript 1 slash 5
Truncated b equals 0.6611 left-parenthesis alpha left-parenthesis 2 right-parenthesis upper T right-parenthesis Superscript 1 slash 5
Tukey-Hanning b equals 1.7462 left-parenthesis alpha left-parenthesis 2 right-parenthesis upper T right-parenthesis Superscript 1 slash 5


Let StartSet g Subscript a t Baseline EndSet denote each series in StartSet g Subscript t Baseline EndSet, and let left-parenthesis rho Subscript a Baseline comma sigma Subscript a Superscript 2 Baseline right-parenthesis denote the corresponding estimates of the autoregressive and innovation variance parameters of the AR(1) model on StartSet g Subscript a t Baseline EndSet, a equals 1 comma ellipsis comma k, where the AR(1) model is parameterized as g Subscript a t Baseline equals rho g Subscript a t minus 1 Baseline plus epsilon Subscript a t with upper V a r left-parenthesis epsilon Subscript a t Baseline right-parenthesis equals sigma Subscript a Superscript 2. The factors alpha left-parenthesis 1 right-parenthesis and alpha left-parenthesis 2 right-parenthesis are estimated with the formulas

StartLayout 1st Row 1st Column alpha left-parenthesis 1 right-parenthesis 2nd Column equals StartStartFraction sigma-summation Underscript a equals 1 Overscript k Endscripts StartFraction 4 rho Subscript a Superscript 2 Baseline sigma Subscript a Superscript 4 Baseline Over left-parenthesis 1 minus rho Subscript a Baseline right-parenthesis Superscript 6 Baseline left-parenthesis 1 plus rho Subscript a Baseline right-parenthesis squared EndFraction OverOver sigma-summation Underscript a equals 1 Overscript k Endscripts StartFraction sigma Subscript a Superscript 4 Baseline Over left-parenthesis 1 minus rho Subscript a Baseline right-parenthesis Superscript 4 Baseline EndFraction EndEndFraction 2nd Row 1st Column alpha left-parenthesis 2 right-parenthesis 2nd Column equals StartStartFraction sigma-summation Underscript a equals 1 Overscript k Endscripts StartFraction 4 rho Subscript a Superscript 2 Baseline sigma Subscript a Superscript 4 Baseline Over left-parenthesis 1 minus rho Subscript a Baseline right-parenthesis Superscript 8 Baseline EndFraction OverOver sigma-summation Underscript a equals 1 Overscript k Endscripts StartFraction sigma Subscript a Superscript 4 Baseline Over left-parenthesis 1 minus rho Subscript a Baseline right-parenthesis Superscript 4 Baseline EndFraction EndEndFraction EndLayout

When you specify BANDWIDTH=NEWEYWEST94, according to Newey and West (1994) the bandwidth parameter is estimated as shown in Table 4.

Table 4: Bandwidth Parameter Estimation

Kernel Name Bandwidth Parameter
Bartlett b equals 1.1447 left-parenthesis StartSet s 1 slash s 0 EndSet squared upper T right-parenthesis Superscript 1 slash 3
Parzen b equals 2.6614 left-parenthesis StartSet s 1 slash s 0 EndSet squared upper T right-parenthesis Superscript 1 slash 5
Quadratic spectral b equals 1.3221 left-parenthesis StartSet s 1 slash s 0 EndSet squared upper T right-parenthesis Superscript 1 slash 5
Truncated b equals 0.6611 left-parenthesis StartSet s 1 slash s 0 EndSet squared upper T right-parenthesis Superscript 1 slash 5
Tukey-Hanning b equals 1.7462 left-parenthesis StartSet s 1 slash s 0 EndSet squared upper T right-parenthesis Superscript 1 slash 5


The factors s 1 and s 0 are estimated with the following formulas:

StartLayout 1st Row 1st Column s 1 2nd Column equals 2 sigma-summation Underscript j equals 1 Overscript n Endscripts j sigma Subscript j Baseline 2nd Row 1st Column s 0 2nd Column equals sigma 0 plus 2 sigma-summation Underscript j equals 1 Overscript n Endscripts sigma Subscript j EndLayout

where n is the lag selection parameter and is determined by kernels, as listed in Table 5.

Table 5: Lag Selection Parameter Estimation

Kernel Name Lag Selection Parameter
Bartlett n equals c left-parenthesis upper T slash 100 right-parenthesis Superscript 2 slash 9
Parzen n equals c left-parenthesis upper T slash 100 right-parenthesis Superscript 4 slash 25
Quadratic spectral n equals c left-parenthesis upper T slash 100 right-parenthesis Superscript 2 slash 25
Truncated n equals c left-parenthesis upper T slash 100 right-parenthesis Superscript 1 slash 5
Tukey-Hanning n equals c left-parenthesis upper T slash 100 right-parenthesis Superscript 1 slash 5


The factor c in Table 5 is specified by the C= option. By default, it is 12.

The factor sigma Subscript j is estimated with the equation

sigma Subscript j Baseline equals upper T Superscript negative 1 Baseline sigma-summation Underscript t equals j plus 1 Overscript upper T Endscripts left-parenthesis sigma-summation Underscript a equals i Overscript k Endscripts g Subscript a t Baseline sigma-summation Underscript a equals i Overscript k Endscripts g Subscript a t minus j Baseline right-parenthesis comma j equals 0 comma ellipsis comma n

where i is 1 if the NOINT option in the MODEL statement is specified (otherwise, it is 2), and g Subscript a t is the same as in the Andrews method.

If you specify BANDWIDTH=SAMPLESIZE, the bandwidth parameter is estimated with the equation

b equals StartLayout Enlarged left-brace 1st Row 1st Column left floor gamma upper T Superscript r Baseline plus c right floor 2nd Column if BANDWIDTH equals SAMPLESIZE left-parenthesis INT right-parenthesis option is specified 2nd Row 1st Column gamma upper T Superscript r Baseline plus c 2nd Column otherwise EndLayout

where T is the sample size; left floor x right floor is the largest integer less than or equal to x; and gamma, r, and c are values specified by the BANDWIDTH=SAMPLESIZE(GAMMA=, RATE=, CONSTANT=) options, respectively.

If you specify the PREWHITENING option, g Subscript t is prewhitened by the VAR(1) model,

g Subscript t Baseline equals upper A g Subscript t minus 1 Baseline plus w Subscript t

Then M is calculated by

upper M Subscript normal upper H normal upper A normal upper C Baseline equals a left-parenthesis upper I minus upper A right-parenthesis Superscript negative 1 Baseline left-parenthesis sigma-summation Underscript t equals 1 Overscript upper T Endscripts w Subscript t Baseline w prime Subscript t plus sigma-summation Underscript j equals 1 Overscript upper T minus 1 Endscripts k left-parenthesis StartFraction j Over b EndFraction right-parenthesis sigma-summation Underscript t equals 1 Overscript upper T minus j Endscripts left-parenthesis w Subscript t Baseline w prime Subscript t plus j plus w Subscript t plus j Baseline w prime Subscript t right-parenthesis right-parenthesis left-parenthesis left-parenthesis upper I minus upper A right-parenthesis Superscript negative 1 Baseline right-parenthesis prime

The bandwidth calculation is also based on the prewhitened series w Subscript t.

Last updated: June 19, 2025