PANEL Procedure

Dynamic Panel Estimation (DYNDIFF and DYNSYS Options)

You perform dynamic panel estimation that uses first differences by specifying the DYNDIFF option in the MODEL statement. For dynamic panel estimation that uses a full system of difference and level equations, specify the DYNSYS option. For an example of dynamic panel estimation, see Example 25.5.

Dynamic panel models are regression models that include lagged versions of the dependent variable as covariates. Consider the following panel regression, which includes L lags of the dependent variable:

y Subscript i t Baseline equals sigma-summation Underscript j equals 1 Overscript upper L Endscripts phi Subscript j Baseline y Subscript i comma t minus j Baseline plus sigma-summation Underscript k equals 1 Overscript upper K Endscripts x Subscript i t k Baseline beta Subscript k Baseline plus nu Subscript i Baseline plus epsilon Subscript i t

Because the effect is common to all observations for that individual, it is correlated with any lagged y because it played a role in its realization. As such, lagged dependent variables are endogenous regressors and require special consideration.

First Differencing

For ease of notation, consider the special case . A first attempt to remove the source of the correlation would be to take first differences, which removes . That is,

normal upper Delta y Subscript i t Baseline equals phi normal upper Delta y Subscript i comma t minus 1 Baseline plus normal upper Delta x Subscript i t Baseline beta plus eta Subscript i t

where , , and . Even though the individual effects are removed, the problem of endogeneity persists because is correlated with the differenced error term . That is because is a component of (Nickell 1981).

Arellano and Bond (1991) show that you can use the generalized method of moments (GMM) to obtain a consistent estimator. In GMM parlance, the moment condition that is violated. Estimation requires a set of instrumental variables that do meet their moment conditions and that can adequately predict . A natural set of instruments is and all other previous realizations of y. These lags of y are not correlated with because they occurred before time . Given the autoregressive nature of the model, (and hence ) is well predicted by its previous values.

Begin with , the first time period where the differenced model holds. The dynamic regression model for individual i can be expressed as

bold y Subscript i Superscript d Baseline equals bold upper X Subscript i Superscript d Baseline bold-italic gamma plus bold-italic eta Subscript i Superscript d

where

bold y Subscript i Superscript d Baseline equals Start 4 By 1 Matrix 1st Row normal upper Delta y Subscript i Baseline 3 Baseline 2nd Row normal upper Delta y Subscript i Baseline 4 Baseline 3rd Row vertical-ellipsis 4th Row normal upper Delta y Subscript i upper T Baseline EndMatrix bold upper X Subscript i Superscript d Baseline equals Start 4 By 2 Matrix 1st Row 1st Column normal upper Delta y Subscript i Baseline 2 Baseline 2nd Column normal upper Delta x Subscript i Baseline 3 Baseline 2nd Row 1st Column normal upper Delta y Subscript i Baseline 3 Baseline 2nd Column normal upper Delta x Subscript i Baseline 4 Baseline 3rd Row 1st Column vertical-ellipsis 2nd Column vertical-ellipsis 4th Row 1st Column normal upper Delta y Subscript i comma upper T minus 1 Baseline 2nd Column normal upper Delta x Subscript i upper T Baseline EndMatrix bold-italic gamma equals StartBinomialOrMatrix phi Choose beta EndBinomialOrMatrix bold-italic eta Subscript i Superscript d Baseline equals Start 4 By 1 Matrix 1st Row eta Subscript i Baseline 3 Baseline 2nd Row eta Subscript i Baseline 4 Baseline 3rd Row vertical-ellipsis 4th Row eta Subscript i upper T EndMatrix

Proceeding with the idea that you can use as instruments for the endogenous covariate for , the instrument matrix for the lagged dependent variables is

bold upper Z Subscript i Superscript d Baseline equals Start 5 By 10 Matrix 1st Row 1st Column y Subscript i Baseline 1 Baseline 2nd Column 0 3rd Column 0 4th Column 0 5th Column 0 6th Column 0 7th Column midline-horizontal-ellipsis 8th Column 0 9th Column 0 10th Column 0 2nd Row 1st Column 0 2nd Column y Subscript i Baseline 1 Baseline 3rd Column y Subscript i Baseline 2 Baseline 4th Column 0 5th Column 0 6th Column 0 7th Column midline-horizontal-ellipsis 8th Column 0 9th Column 0 10th Column 0 3rd Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column y Subscript i Baseline 1 Baseline 5th Column y Subscript i Baseline 2 Baseline 6th Column y Subscript i Baseline 3 Baseline 7th Column 0 8th Column midline-horizontal-ellipsis 9th Column 0 10th Column 0 4th Row 1st Column vertical-ellipsis 2nd Column vertical-ellipsis 3rd Column vertical-ellipsis 4th Column vertical-ellipsis 5th Column vertical-ellipsis 6th Column vertical-ellipsis 7th Column down-right-diagonal-ellipsis 8th Column vertical-ellipsis 9th Column vertical-ellipsis 10th Column vertical-ellipsis 5th Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column 0 5th Column 0 6th Column 0 7th Column 0 8th Column y Subscript i Baseline 1 Baseline 9th Column midline-horizontal-ellipsis 10th Column y Subscript i comma upper T minus 2 EndMatrix

This extends naturally to and ; simply add columns to and elements to as appropriate. When an observation is either missing or lost because of missing lags, delete the corresponding rows of , , , and . Even if an observation is not missing with respect to the regression model, some of the lagged instruments might not be available because previous observations are missing. When that occurs, replace any missing instrument with 0.

When you specify the DYNDIFF option in the MODEL statement, PROC PANEL by default treats x variables as exogenous and uses a projection that leaves these variables unchanged in the differenced regression. The full instrument matrix is then , where

bold upper D Subscript i Baseline equals Start 4 By 4 Matrix 1st Row 1st Column normal upper Delta x Subscript i Baseline 31 Baseline 2nd Column normal upper Delta x Subscript i Baseline 32 Baseline 3rd Column midline-horizontal-ellipsis 4th Column normal upper Delta x Subscript i Baseline 3 upper K Baseline 2nd Row 1st Column normal upper Delta x Subscript i Baseline 41 Baseline 2nd Column normal upper Delta x Subscript i Baseline 42 Baseline 3rd Column midline-horizontal-ellipsis 4th Column normal upper Delta x Subscript i Baseline 4 upper K Baseline 3rd Row 1st Column vertical-ellipsis 2nd Column vertical-ellipsis 3rd Column vertical-ellipsis 4th Column vertical-ellipsis 4th Row 1st Column normal upper Delta x Subscript i upper T Baseline 1 Baseline 2nd Column normal upper Delta x Subscript i upper T Baseline 2 Baseline 3rd Column midline-horizontal-ellipsis 4th Column normal upper Delta x Subscript i upper T upper K EndMatrix

When , the default has columns. Each column of satisfies the moment condition .

System GMM

Blundell and Bond (1998) proposed a system GMM estimator that uses additional moment conditions to increase efficiency. The efficiency gain can be substantial when there is strong serial correlation in the dependent variable.

When either is near 1 or is large, the lagged dependent variables are weak instruments for the differenced variables . System GMM solves the weak instrument problem by augmenting the difference equations described previously with a set of level equations. When , the level equations are

bold y Subscript i Superscript script l Baseline equals bold upper X Subscript i Superscript script l Baseline bold-italic gamma plus bold-italic epsilon Subscript i Superscript script l

where

bold y Subscript i Superscript script l Baseline equals Start 4 By 1 Matrix 1st Row y Subscript i Baseline 2 Baseline 2nd Row y Subscript i Baseline 3 Baseline 3rd Row vertical-ellipsis 4th Row y Subscript i upper T Baseline EndMatrix bold upper X Subscript i Superscript script l Baseline equals Start 4 By 2 Matrix 1st Row 1st Column y Subscript i Baseline 1 Baseline 2nd Column x Subscript i Baseline 2 Baseline 2nd Row 1st Column y Subscript i Baseline 2 Baseline 2nd Column x Subscript i Baseline 3 Baseline 3rd Row 1st Column vertical-ellipsis 2nd Column vertical-ellipsis 4th Row 1st Column y Subscript i comma upper T minus 1 Baseline 2nd Column x Subscript i upper T Baseline EndMatrix bold-italic epsilon Subscript i Superscript script l Baseline equals Start 4 By 1 Matrix 1st Row nu Subscript i Baseline plus epsilon Subscript i Baseline 2 Baseline 2nd Row nu Subscript i Baseline plus epsilon Subscript i Baseline 3 Baseline 3rd Row vertical-ellipsis 4th Row nu Subscript i Baseline plus epsilon Subscript i upper T EndMatrix

Blundell and Bond (1998) note that you can use lagged differences of y as instruments for the levels of y. The main instrument matrix for the level equations is then

bold upper Z Subscript i Superscript script l Baseline equals Start 5 By 5 Matrix 1st Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column midline-horizontal-ellipsis 5th Column 0 2nd Row 1st Column 0 2nd Column normal upper Delta y Subscript i Baseline 2 Baseline 3rd Column 0 4th Column midline-horizontal-ellipsis 5th Column 0 3rd Row 1st Column 0 2nd Column 0 3rd Column normal upper Delta y Subscript i Baseline 3 Baseline 4th Column midline-horizontal-ellipsis 5th Column 0 4th Row 1st Column vertical-ellipsis 2nd Column vertical-ellipsis 3rd Column vertical-ellipsis 4th Column down-right-diagonal-ellipsis 5th Column vertical-ellipsis 5th Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column midline-horizontal-ellipsis 5th Column normal upper Delta y Subscript i comma upper T minus 1 EndMatrix

where the first row corresponds to time . You can extend this to and by adding columns to and elements to as appropriate. Higher-order lags require deletion of the leading rows of , , , and .

Regression on the full system is obtained by stacking and to form , stacking and to form , and stacking and to form .

When you specify the DYNSYS model option, the default instrument matrix for the full system is

bold upper Z Subscript i Baseline equals Start 2 By 3 Matrix 1st Row 1st Column bold upper Z Subscript i Superscript d Baseline 2nd Column bold 0 3rd Column bold upper D Subscript i Baseline 2nd Row 1st Column bold 0 2nd Column bold upper Z Subscript i Superscript script l Baseline 3rd Column bold 0 EndMatrix

Estimation

The estimation in this section assumes system GMM. To obtain difference GMM, restrict estimation to the rows that correspond to the difference equations.

The initial moment matrix is derived from the theoretical variance of the combined residuals and is expressed as , where

bold upper G Subscript 1 i Baseline equals Start 7 By 7 Matrix 1st Row 1st Column 1 2nd Column negative 0.5 3rd Column 0 4th Column midline-horizontal-ellipsis 5th Column 0 6th Column 0 7th Column 0 2nd Row 1st Column negative 0.5 2nd Column 1 3rd Column negative 0.5 4th Column midline-horizontal-ellipsis 5th Column 0 6th Column 0 7th Column 0 3rd Row 1st Column 0 2nd Column negative 0.5 3rd Column 1 4th Column midline-horizontal-ellipsis 5th Column 0 6th Column 0 7th Column 0 4th Row 1st Column vertical-ellipsis 2nd Column vertical-ellipsis 3rd Column vertical-ellipsis 4th Column down-right-diagonal-ellipsis 5th Column vertical-ellipsis 6th Column vertical-ellipsis 7th Column vertical-ellipsis 5th Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column midline-horizontal-ellipsis 5th Column 1 6th Column negative 0.5 7th Column 0 6th Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column midline-horizontal-ellipsis 5th Column negative 0.5 6th Column 1 7th Column negative 0.5 7th Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column midline-horizontal-ellipsis 5th Column 0 6th Column negative 0.5 7th Column 1 EndMatrix

and is 0.5 times the identity matrix.

Define the weighting matrix as

bold upper W 1 equals left-parenthesis sigma-summation Underscript i equals 1 Overscript upper N Endscripts bold upper Z prime Subscript i Baseline bold upper H Subscript 1 i Baseline bold upper Z Subscript i Baseline right-parenthesis Superscript negative 1

and the projections as

bold upper P Subscript y Baseline equals sigma-summation Underscript i equals 1 Overscript upper N Endscripts bold upper Z prime Subscript i Baseline bold y Subscript i Superscript s Baseline semicolon bold upper P Subscript x Baseline equals sigma-summation Underscript i equals 1 Overscript upper N Endscripts bold upper Z prime Subscript i Baseline bold upper X Subscript i Superscript s

The one-step GMM estimate of is the weighted OLS estimator

ModifyingAbove bold-italic gamma With caret Subscript 1 Baseline equals left-parenthesis bold upper P Subscript x Superscript prime Baseline bold upper W 1 bold upper P Subscript x Baseline right-parenthesis Superscript negative 1 Baseline bold upper P Subscript x Superscript prime Baseline bold upper W 1 bold upper P Subscript y

The variance of is

normal upper V normal a normal r left-parenthesis ModifyingAbove bold-italic gamma With caret Subscript 1 Baseline right-parenthesis equals ModifyingAbove sigma With caret Subscript epsilon Superscript 2 Baseline left-parenthesis bold upper P Subscript x Superscript prime Baseline bold upper W 1 bold upper P Subscript x Baseline right-parenthesis Superscript negative 1

where is the mean square error (MSE) derived solely from the difference equations, namely

ModifyingAbove sigma With caret Subscript epsilon Superscript 2 Baseline equals left-parenthesis upper M minus upper K right-parenthesis Superscript negative 1 Baseline sigma-summation Underscript i equals 1 Overscript upper N Endscripts left-parenthesis bold y Subscript i Superscript d Baseline minus bold upper X Subscript i Superscript d Baseline ModifyingAbove bold-italic gamma With caret Subscript 1 Baseline right-parenthesis prime left-parenthesis bold y Subscript i Superscript d Baseline minus bold upper X Subscript i Superscript d Baseline ModifyingAbove bold-italic gamma With caret Subscript 1 Baseline right-parenthesis

The total number of observations, M, is equal to the number of observations for which the difference equations hold.

A disadvantage of is its reliance on the theoretical basis of . The two-step GMM estimate of replaces with a version that is obtained from the observed one-step residuals. Let be the outer product of . Then

ModifyingAbove bold-italic gamma With caret Subscript 2 Baseline equals left-parenthesis bold upper P Subscript x Superscript prime Baseline bold upper W 2 bold upper P Subscript x Baseline right-parenthesis Superscript negative 1 Baseline bold upper P Subscript x Superscript prime Baseline bold upper W 2 bold upper P Subscript y

where

bold upper W 2 equals left-parenthesis sigma-summation Underscript i equals 1 Overscript upper N Endscripts bold upper Z prime Subscript i Baseline bold upper H Subscript 2 i Baseline bold upper Z Subscript i Baseline right-parenthesis Superscript negative 1

The variance of is

normal upper V normal a normal r left-parenthesis ModifyingAbove bold-italic gamma With caret Subscript 2 Baseline right-parenthesis equals left-parenthesis bold upper P Subscript x Superscript prime Baseline bold upper W 2 bold upper P Subscript x Baseline right-parenthesis Superscript negative 1

The iterated GMM estimator of continues this pattern: First, use the current estimate to form the residuals that compose . Second, use to form the weighting matrix . Third, use to update the estimate .

There are two criteria by which convergence is achieved. The first (and default) criterion is met when the magnitude of changes by a relative amount smaller than b, as specified in the BTOL= option in the MODEL statement. The second criterion is met when the magnitude of the variance matrix changes by a relative amount smaller than a, as specified in the ATOL= option in the MODEL statement.

Robust variances are calculated by the sandwich method. The robust variance of is

normal upper V normal a normal r Superscript r Baseline left-parenthesis ModifyingAbove bold-italic gamma With caret Subscript 1 Baseline right-parenthesis equals left-parenthesis bold upper P Subscript x Superscript prime Baseline bold upper W 1 bold upper P Subscript x Baseline right-parenthesis Superscript negative 1 Baseline bold upper P Subscript x Superscript prime Baseline bold upper W 1 bold upper W 2 Superscript negative 1 Baseline bold upper W 1 bold upper P Subscript x Baseline left-parenthesis bold upper P Subscript x Superscript prime Baseline bold upper W 1 bold upper P Subscript x Baseline right-parenthesis Superscript negative 1

The robust variance of is

normal upper V normal a normal r Superscript r Baseline left-parenthesis ModifyingAbove bold-italic gamma With caret Subscript 2 Baseline right-parenthesis equals left-parenthesis bold upper P Subscript x Superscript prime Baseline bold upper W 2 bold upper P Subscript x Baseline right-parenthesis Superscript negative 1 Baseline bold upper P Subscript x Superscript prime Baseline bold upper W 2 bold upper W 3 Superscript negative 1 Baseline bold upper W 2 bold upper P Subscript x Baseline left-parenthesis bold upper P Subscript x Superscript prime Baseline bold upper W 2 bold upper P Subscript x Baseline right-parenthesis Superscript negative 1

and so on as you iterate .

Arellano and Bond (1991), among others, note that robust two-step variance estimators are biased. Windmeijer (2005) derived a bias-corrected variance of , and you can obtain this correction by specifying the BIASCORRECTED option in the MODEL statement.

Define the one-step and two-step residuals as and . Also define the projected two-step residual as

bold upper P Subscript e Baseline equals sigma-summation Underscript i equals 1 Overscript upper N Endscripts bold upper Z prime Subscript i Baseline ModifyingAbove bold e With caret Subscript 2 i

Formulate the matrix such that its kth column is , where . The matrix is the quadratic form

bold upper F Subscript k Baseline equals sigma-summation Underscript i equals 1 Overscript upper N Endscripts bold upper Z prime Subscript i Baseline left-parenthesis bold x Subscript i k Baseline ModifyingAbove bold e With caret Subscript 1 i Superscript prime Baseline plus ModifyingAbove bold e With caret Subscript 1 i Baseline bold x Subscript i k Superscript prime Baseline right-parenthesis bold upper Z Subscript i

where is the kth column of .

The Windmeijer (2005) bias-corrected variance is

normal upper V normal a normal r Superscript w Baseline left-parenthesis ModifyingAbove bold-italic gamma With caret Subscript 2 Baseline right-parenthesis equals bold upper V 2 plus bold upper D bold upper V 2 plus bold upper V 2 bold upper D Superscript prime Baseline plus bold upper D bold upper V 1 Superscript r Baseline bold upper D prime

where is the robust variance estimate of .

Estimating the Intercept

The intercept term vanishes when you take first differences and is thus identified only in the level equations. If you specify the DYNDIFF option in the MODEL statement and your model includes an intercept, then PROC PANEL will fit the model by using system GMM with the following (default) instrumentation,

bold upper Z Subscript i Baseline equals Start 2 By 3 Matrix 1st Row 1st Column bold upper Z Subscript i Superscript d Baseline 2nd Column bold upper D Subscript i Baseline 3rd Column bold 0 2nd Row 1st Column bold 0 2nd Column bold 0 3rd Column bold j Subscript i EndMatrix

where is a column of ones. Because all the level instruments are zero except the constant, parameter estimates other than the intercept are unaffected by the added level equations.

If you specify the DYNDIFF option in the MODEL statement and your model does not include an intercept, then the level equations are excluded from the estimation.

If you specify the DYNSYS option in the MODEL statement, then there is no issue regarding the intercept. Under the default instrument specification, if includes an intercept, then the level instruments include an added column of ones. That is,

bold upper Z Subscript i Baseline equals Start 2 By 4 Matrix 1st Row 1st Column bold upper Z Subscript i Superscript d Baseline 2nd Column bold 0 3rd Column bold upper D Subscript i Baseline 4th Column bold 0 2nd Row 1st Column bold 0 2nd Column bold upper Z Subscript i Superscript script l Baseline 3rd Column bold 0 4th Column bold j Subscript i EndMatrix

Customizing Instruments

When you specify the DYNSYS option for performing system GMM, the default instrument matrix is

where is either a column of ones, or if you specify the NOINT option.

You can override the default set of instruments by specifying an INSTRUMENTS statement. You can choose which instrument sets to include as components of . The INSTRUMENTS statement provides options to generate the appropriate instruments when variables are either endogenous, predetermined, or exogenous.

The following discussion assumes that you are performing system GMM by using the DYNSYS option in the MODEL statement. When you specify the DYNDIFF option instead, any specification (except the constant ) that pertains to the level equations is ignored.

Dependent Variable

The DEPVAR option in the INSTRUMENTS statement adds instruments for the dependent variable and its lags. Specifying DEPVAR(DIFF) includes the lagged levels of the dependent variable (the matrix ) in the difference equations. Specifying DEPVAR(LEVEL) includes the first differences of the dependent variable (the matrix ) in the level equations. Specifying DEPVAR(BOTH) (or simply DEPVAR) includes both and .

You should at a minimum include instruments for the dependent variable when you perform dynamic panel estimation. For example:

proc panel data=a;
   id State Year;
   instruments depvar;
   model Sales = Price PopDensity / dynsys;
run;

Constant (or Intercept)

Specifying the keyword CONSTANT includes the constant vector in the level equations.

Endogenous Variables

A variable is endogenous if for and 0 otherwise.

The DIFFEND= option specifies a list of endogenous variables that form instrument matrices for the difference equations. The instruments are "GMM-style" and mirror the form used for the dependent variable. Suppose that the model includes one lag of the dependent variable (). Specifying DIFFEND=(X) adds the following instruments to the difference equations:

bold upper G Subscript i Superscript d Baseline equals Start 5 By 10 Matrix 1st Row 1st Column x Subscript i Baseline 1 Baseline 2nd Column 0 3rd Column 0 4th Column 0 5th Column 0 6th Column 0 7th Column midline-horizontal-ellipsis 8th Column 0 9th Column 0 10th Column 0 2nd Row 1st Column 0 2nd Column x Subscript i Baseline 1 Baseline 3rd Column x Subscript i Baseline 2 Baseline 4th Column 0 5th Column 0 6th Column 0 7th Column midline-horizontal-ellipsis 8th Column 0 9th Column 0 10th Column 0 3rd Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column x Subscript i Baseline 1 Baseline 5th Column x Subscript i Baseline 2 Baseline 6th Column x Subscript i Baseline 3 Baseline 7th Column 0 8th Column midline-horizontal-ellipsis 9th Column 0 10th Column 0 4th Row 1st Column vertical-ellipsis 2nd Column vertical-ellipsis 3rd Column vertical-ellipsis 4th Column vertical-ellipsis 5th Column vertical-ellipsis 6th Column vertical-ellipsis 7th Column down-right-diagonal-ellipsis 8th Column vertical-ellipsis 9th Column vertical-ellipsis 10th Column vertical-ellipsis 5th Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column 0 5th Column 0 6th Column 0 7th Column 0 8th Column x Subscript i Baseline 1 Baseline 9th Column midline-horizontal-ellipsis 10th Column x Subscript i comma upper T minus 2 EndMatrix

The first row corresponds to time . The instruments are in lagged levels.

The LEVELEND= option specifies a list of endogenous variables that form instrument matrices for the level equations. The instruments mirror the form used for the dependent variable. Suppose that the model includes one lag of the dependent variable (). Specifying LEVELEND=(X) adds the following instruments to the level equations:

bold upper G Subscript i Superscript script l Baseline equals Start 5 By 5 Matrix 1st Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column midline-horizontal-ellipsis 5th Column 0 2nd Row 1st Column 0 2nd Column normal upper Delta x Subscript i Baseline 2 Baseline 3rd Column 0 4th Column midline-horizontal-ellipsis 5th Column 0 3rd Row 1st Column 0 2nd Column 0 3rd Column normal upper Delta x Subscript i Baseline 3 Baseline 4th Column midline-horizontal-ellipsis 5th Column 0 4th Row 1st Column vertical-ellipsis 2nd Column vertical-ellipsis 3rd Column vertical-ellipsis 4th Column down-right-diagonal-ellipsis 5th Column vertical-ellipsis 5th Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column midline-horizontal-ellipsis 5th Column normal upper Delta x Subscript i comma upper T minus 1 EndMatrix

The first row corresponds to time . Because the instruments are used for the level equations, they are in lagged differences.

The following code fits a dynamic panel model by using difference equations. It includes GMM-style instruments for both the dependent variable Sales and the variable Price:

proc panel data=a;
   id State Year;
   instruments depvar diffend = (Price);
   model Sales = Price PopDensity / dyndiff;
run;

Predetermined Variables

A variable is predetermined if for and 0 otherwise.

The DIFFPRE= option specifies a list of variables that are considered to be predetermined in the difference equations. The DIFFPRE= option works similarly to the DIFFEND= option, except that each observation contains an extra instrument that reflects orthogonality in the current time period. If , specifying DIFFPRE=(X) adds the following instruments to the difference equations:

bold upper P Subscript i Superscript d Baseline equals Start 4 By 10 Matrix 1st Row 1st Column x Subscript i Baseline 1 Baseline 2nd Column x Subscript i Baseline 2 Baseline 3rd Column 0 4th Column 0 5th Column 0 6th Column 0 7th Column midline-horizontal-ellipsis 8th Column 0 9th Column 0 10th Column 0 2nd Row 1st Column 0 2nd Column 0 3rd Column x Subscript i Baseline 1 Baseline 4th Column x Subscript i Baseline 2 Baseline 5th Column x Subscript i Baseline 3 Baseline 6th Column 0 7th Column midline-horizontal-ellipsis 8th Column 0 9th Column 0 10th Column 0 3rd Row 1st Column vertical-ellipsis 2nd Column vertical-ellipsis 3rd Column vertical-ellipsis 4th Column vertical-ellipsis 5th Column vertical-ellipsis 6th Column vertical-ellipsis 7th Column down-right-diagonal-ellipsis 8th Column vertical-ellipsis 9th Column vertical-ellipsis 10th Column vertical-ellipsis 4th Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column 0 5th Column 0 6th Column 0 7th Column 0 8th Column x Subscript i Baseline 1 Baseline 9th Column midline-horizontal-ellipsis 10th Column x Subscript i comma upper T minus 1 EndMatrix

The first row corresponds to time .

The LEVELPRE= option specifies a list of variables that are considered to be predetermined in the level equations. The LEVELPRE= option works similarly to the LEVELEND= option, except that the lag is shifted up to reflect orthogonality in the current time period. If , specifying LEVELPRE=(X) adds the following instruments to the level equations:

bold upper P Subscript i Superscript script l Baseline equals Start 5 By 5 Matrix 1st Row 1st Column normal upper Delta x Subscript i Baseline 2 Baseline 2nd Column 0 3rd Column 0 4th Column midline-horizontal-ellipsis 5th Column 0 2nd Row 1st Column 0 2nd Column normal upper Delta x Subscript i Baseline 3 Baseline 3rd Column 0 4th Column midline-horizontal-ellipsis 5th Column 0 3rd Row 1st Column 0 2nd Column 0 3rd Column normal upper Delta x Subscript i Baseline 4 Baseline 4th Column midline-horizontal-ellipsis 5th Column 0 4th Row 1st Column vertical-ellipsis 2nd Column vertical-ellipsis 3rd Column vertical-ellipsis 4th Column down-right-diagonal-ellipsis 5th Column vertical-ellipsis 5th Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column midline-horizontal-ellipsis 5th Column normal upper Delta x Subscript i comma upper T EndMatrix

The first row corresponds to time .

The following code fits a dynamic panel model by using difference equations. The instrument set includes GMM-style instruments for the dependent variable Sales and GMM-style instruments that correspond to the predetermined variable Price:

proc panel data=a;
   id State Year;
   instruments depvar diffpre = (Price);
   model Sales = Price PopDensity / dyndiff;
run;

Exogenous Variables

Exogenous variables are uncorrelated with both the level residuals and the differenced residuals. If a regression variable is exogenous, you might want to include that variable in the instrument set as a standard instrument. The DIFFEQ= option specifies a list of variables that compose the matrix of standard instruments for the difference equations; for an example of how is formed, see the section First Differencing. These variables are usually exogenous regressors that you want to preserve under the projection to the instrument space. Because these instruments belong to the difference equations, the variables are automatically differenced.

The LEVELEQ= option specifies a list of variables that form a matrix of standard instruments that is included in the level equations. You can use this option to specify external instruments that are not part of the main regression but that can be used as instruments for the regression variables in levels.

If , specifying LEVELEQ=(X1 X2) adds the following instruments to the level equations:

bold upper L Subscript i Baseline equals Start 4 By 2 Matrix 1st Row 1st Column x Subscript i Baseline 21 Baseline 2nd Column x Subscript i Baseline 22 Baseline 2nd Row 1st Column x Subscript i Baseline 31 Baseline 2nd Column x Subscript i Baseline 32 Baseline 3rd Row 1st Column vertical-ellipsis 2nd Column vertical-ellipsis 4th Row 1st Column x Subscript i upper T Baseline 1 Baseline 2nd Column x Subscript i upper T Baseline 2 EndMatrix

The first row corresponds to time .

The following example illustrates how you would use an INSTRUMENTS statement to obtain the default set of instruments for system GMM:

proc panel data=a;
   id State Year;
   instruments depvar(both) constant diffeq = (Price PopDensity);
   model Sales = Price PopDensity / dynsys;
run;

Limiting the Number of Instruments

Arellano and Bond’s (1991) technique of expanding instruments is a useful method of dealing with autocorrelation in the response variable. However, too many instruments can bias the estimator. The number of instruments grows quadratically with the number of time periods, making computations less feasible for larger T.

By default, PROC PANEL uses all available lags. You can limit the number of instruments by specifying the MAXBAND= option in the INSTRUMENTS statement. For example, specifying MAXBAND=5 limits the number of GMM-style instruments to five per observation, for each variable. The MAXBAND= option applies to all GMM-style instruments: those for the dependent variable, those from the DIFFEND= option, and those from the DIFFPRE= option.

Sargan Test of Overidentifying Restrictions

A Sargan test is a referendum on your choice of instruments in a dynamic panel model. The Sargan test statistic for one-step GMM is

upper J equals StartFraction 1 Over ModifyingAbove sigma With caret Subscript epsilon Superscript 2 Baseline EndFraction left-parenthesis sigma-summation Underscript i equals 1 Overscript upper N Endscripts bold upper Z prime Subscript i Baseline ModifyingAbove bold e With caret Subscript 1 i Baseline right-parenthesis prime bold upper W 1 left-parenthesis sigma-summation Underscript i equals 1 Overscript upper N Endscripts bold upper Z prime Subscript i Baseline ModifyingAbove bold e With caret Subscript 1 i Baseline right-parenthesis

The Sargan test statistic for two-step GMM is

upper J equals left-parenthesis sigma-summation Underscript i equals 1 Overscript upper N Endscripts bold upper Z prime Subscript i Baseline ModifyingAbove bold e With caret Subscript 2 i Baseline right-parenthesis prime bold upper W 2 left-parenthesis sigma-summation Underscript i equals 1 Overscript upper N Endscripts bold upper Z prime Subscript i Baseline ModifyingAbove bold e With caret Subscript 2 i Baseline right-parenthesis

It is similarly incremented for further iterations of GMM.

The null hypothesis of the Sargan test is that the moment conditions (as defined by the columns ) hold, and thus form an adequate set of instruments. Under the null, J is distributed as with degrees of freedom equal to the rank of minus the number of parameters K. The nominal rank of is equal to the number of instruments. However, this number can be reduced because of collinearity and redundancy in the instrument specification. Furthermore, when , the maximum rank of is N, regardless of the number of instruments.

You should treat Sargan tests with caution when robust variances are used in the estimation. The theoretical distribution of J does not hold under conditions that favor robust variances.

AR(m ) Tests

An AR(m) test is a test for autocorrelation of order m in the model residuals. Let be the working variance of the residuals from the full system. The precise definition of depends on the GMM stage and whether robust variances are specified; see Table 3.

Table 3: Definition of the Working Residual Variance

Estimator
One-step
One-step, robust
Two-step
Two-step, robust
Iteration c
Iteration c, robust

Define the residual vector

ModifyingAbove bold e With caret Subscript i Baseline equals StartBinomialOrMatrix ModifyingAbove bold-italic eta With caret Subscript i Superscript d Baseline Choose bold 0 EndBinomialOrMatrix

where are the residuals from the difference equations, evaluated at the final estimate of . The trailing zeros correspond to the level equations. Define as a lagged version of such that the following are true:

The first m elements of are 0.
The next elements of are the first elements of , where p is the number of difference equations.
The trailing elements of that correspond to the level equations are 0.

Define the following:

StartLayout 1st Row 1st Column bold upper P Subscript m 2nd Column equals 3rd Column sigma-summation Underscript i equals 1 Overscript upper N Endscripts bold upper Z Subscript i Superscript prime Baseline bold upper R Subscript i Superscript s Baseline ModifyingAbove bold-italic omega With caret Subscript m i 2nd Row 1st Column bold upper Q Subscript m 2nd Column equals 3rd Column sigma-summation Underscript i equals 1 Overscript upper N Endscripts ModifyingAbove bold-italic omega With caret Subscript m i Superscript prime Baseline bold upper X Subscript i Superscript s EndLayout

The AR(m) test statistic is , where

StartLayout 1st Row 1st Column k Subscript 0 m 2nd Column equals 3rd Column sigma-summation Underscript i equals 1 Overscript upper N Endscripts ModifyingAbove bold-italic omega With caret Subscript m i Superscript prime Baseline ModifyingAbove bold e With caret Subscript i 2nd Row 1st Column k Subscript 1 m 2nd Column equals 3rd Column sigma-summation Underscript i equals 1 Overscript upper N Endscripts ModifyingAbove bold-italic omega With caret Subscript m i Superscript prime Baseline bold upper R Subscript i Superscript s Baseline ModifyingAbove bold-italic omega With caret Subscript m i 3rd Row 1st Column k Subscript 2 m 2nd Column equals 3rd Column minus 2 bold upper Q Subscript m Baseline left-parenthesis bold upper P prime Subscript x Baseline bold upper W Subscript c Baseline bold upper P Subscript x Baseline right-parenthesis Superscript negative 1 Baseline bold upper P Subscript x Superscript prime Baseline bold upper W Subscript c Baseline bold upper P Subscript m 4th Row 1st Column k Subscript 3 m 2nd Column equals 3rd Column bold upper Q Subscript m Baseline bold upper V bold upper Q Subscript m Superscript prime EndLayout

The matrix is the estimated variance matrix of the parameters, corresponding to the GMM stage specified, and either model-based, robust, or bias-corrected.

Under the null hypothesis of no autocorrelation, follows a standard normal distribution. Because of the differencing in the errors, well-specified models present autocorrelation of order , but any autocorrelation at higher orders indicates a violation of assumptions.

Last updated: June 19, 2025