You perform dynamic panel estimation that uses first differences by specifying the DYNDIFF option in the MODEL statement. For dynamic panel estimation that uses a full system of difference and level equations, specify the DYNSYS option. For an example of dynamic panel estimation, see Example 25.5.
Dynamic panel models are regression models that include lagged versions of the dependent variable as covariates. Consider the following panel regression, which includes L lags of the dependent variable:
Because the effect is common to all observations for that individual, it is correlated with any lagged y because it played a role in its realization. As such, lagged dependent variables are endogenous regressors and require special consideration.
For ease of notation, consider the special case . A first attempt to remove the source of the correlation would be to take first differences, which removes
. That is,
where ,
, and
. Even though the individual effects are removed, the problem of endogeneity persists because
is correlated with the differenced error term
. That is because
is a component of
(Nickell 1981).
Arellano and Bond (1991) show that you can use the generalized method of moments (GMM) to obtain a consistent estimator. In GMM parlance, the moment condition that is violated. Estimation requires a set of instrumental variables that do meet their moment conditions and that can adequately predict
. A natural set of instruments is
and all other previous realizations of y. These lags of y are not correlated with
because they occurred before time
. Given the autoregressive nature of the model,
(and hence
) is well predicted by its previous values.
Begin with , the first time period where the differenced model holds. The dynamic regression model for individual i can be expressed as
where
Proceeding with the idea that you can use as instruments for the endogenous covariate
for
, the instrument matrix for the lagged dependent variables is
This extends naturally to and
; simply add columns to
and elements to
as appropriate. When an observation is either missing or lost because of missing lags, delete the corresponding rows of
,
,
, and
. Even if an observation is not missing with respect to the regression model, some of the lagged instruments might not be available because previous observations are missing. When that occurs, replace any missing instrument with 0.
When you specify the DYNDIFF option in the MODEL statement, PROC PANEL by default treats x variables as exogenous and uses a projection that leaves these variables unchanged in the differenced regression. The full instrument matrix is then , where
When , the default
has
columns. Each column
of
satisfies the moment condition
.
Blundell and Bond (1998) proposed a system GMM estimator that uses additional moment conditions to increase efficiency. The efficiency gain can be substantial when there is strong serial correlation in the dependent variable.
When either is near 1 or
is large, the lagged dependent variables
are weak instruments for the differenced variables
. System GMM solves the weak instrument problem by augmenting the difference equations described previously with a set of level equations. When
, the level equations are
where
Blundell and Bond (1998) note that you can use lagged differences of y as instruments for the levels of y. The main instrument matrix for the level equations is then
where the first row corresponds to time . You can extend this to
and
by adding columns to
and elements to
as appropriate. Higher-order lags require deletion of the leading rows of
,
,
, and
.
Regression on the full system is obtained by stacking and
to form
, stacking
and
to form
, and stacking
and
to form
.
When you specify the DYNSYS model option, the default instrument matrix for the full system is
The estimation in this section assumes system GMM. To obtain difference GMM, restrict estimation to the rows that correspond to the difference equations.
The initial moment matrix is derived from the theoretical variance of the combined residuals and is expressed as , where
and is 0.5 times the identity matrix.
Define the weighting matrix as
and the projections as
The one-step GMM estimate of is the weighted OLS estimator
where is the mean square error (MSE) derived solely from the difference equations, namely
The total number of observations, M, is equal to the number of observations for which the difference equations hold.
A disadvantage of is its reliance on the theoretical basis of
. The two-step GMM estimate of
replaces
with a version that is obtained from the observed one-step residuals. Let
be the outer product of
. Then
where
The iterated GMM estimator of continues this pattern: First, use the current estimate
to form the residuals that compose
. Second, use
to form the weighting matrix
. Third, use
to update the estimate
.
There are two criteria by which convergence is achieved. The first (and default) criterion is met when the magnitude of changes by a relative amount smaller than b, as specified in the BTOL= option in the MODEL statement. The second criterion is met when the magnitude of the variance matrix changes by a relative amount smaller than a, as specified in the ATOL= option in the MODEL statement.
Robust variances are calculated by the sandwich method. The robust variance of is
Arellano and Bond (1991), among others, note that robust two-step variance estimators are biased. Windmeijer (2005) derived a bias-corrected variance of , and you can obtain this correction by specifying the BIASCORRECTED option in the MODEL statement.
Define the one-step and two-step residuals as and
. Also define the projected two-step residual as
Formulate the matrix such that its kth column is
, where
. The matrix
is the quadratic form
The Windmeijer (2005) bias-corrected variance is
The intercept term vanishes when you take first differences and is thus identified only in the level equations. If you specify the DYNDIFF option in the MODEL statement and your model includes an intercept, then PROC PANEL will fit the model by using system GMM with the following (default) instrumentation,
where is a column of ones. Because all the level instruments are zero except the constant, parameter estimates other than the intercept are unaffected by the added level equations.
If you specify the DYNDIFF option in the MODEL statement and your model does not include an intercept, then the level equations are excluded from the estimation.
If you specify the DYNSYS option in the MODEL statement, then there is no issue regarding the intercept. Under the default instrument specification, if includes an intercept, then the level instruments include an added column of ones. That is,
When you specify the DYNSYS option for performing system GMM, the default instrument matrix is
where is either a column of ones, or
if you specify the NOINT option.
You can override the default set of instruments by specifying an INSTRUMENTS statement. You can choose which instrument sets to include as components of . The INSTRUMENTS statement provides options to generate the appropriate instruments when variables are either endogenous, predetermined, or exogenous.
The following discussion assumes that you are performing system GMM by using the DYNSYS option in the MODEL statement. When you specify the DYNDIFF option instead, any specification (except the constant ) that pertains to the level equations is ignored.
Dependent Variable
The DEPVAR option in the INSTRUMENTS statement adds instruments for the dependent variable and its lags. Specifying DEPVAR(DIFF) includes the lagged levels of the dependent variable (the matrix ) in the difference equations. Specifying DEPVAR(LEVEL) includes the first differences of the dependent variable (the matrix
) in the level equations. Specifying DEPVAR(BOTH) (or simply DEPVAR) includes both
and
.
You should at a minimum include instruments for the dependent variable when you perform dynamic panel estimation. For example:
proc panel data=a;
id State Year;
instruments depvar;
model Sales = Price PopDensity / dynsys;
run;
Constant (or Intercept)
Specifying the keyword CONSTANT includes the constant vector in the level equations.
Endogenous Variables
A variable is endogenous if
for
and 0 otherwise.
The DIFFEND= option specifies a list of endogenous variables that form instrument matrices for the difference equations. The instruments are "GMM-style" and mirror the form used for the dependent variable. Suppose that the model includes one lag of the dependent variable (). Specifying DIFFEND=(X) adds the following instruments to the difference equations:
The first row corresponds to time . The instruments are in lagged levels.
The LEVELEND= option specifies a list of endogenous variables that form instrument matrices for the level equations. The instruments mirror the form used for the dependent variable. Suppose that the model includes one lag of the dependent variable (). Specifying LEVELEND=(X) adds the following instruments to the level equations:
The first row corresponds to time . Because the instruments are used for the level equations, they are in lagged differences.
The following code fits a dynamic panel model by using difference equations. It includes GMM-style instruments for both the dependent variable Sales and the variable Price:
proc panel data=a;
id State Year;
instruments depvar diffend = (Price);
model Sales = Price PopDensity / dyndiff;
run;
Predetermined Variables
A variable is predetermined if
for
and 0 otherwise.
The DIFFPRE= option specifies a list of variables that are considered to be predetermined in the difference equations. The DIFFPRE= option works similarly to the DIFFEND= option, except that each observation contains an extra instrument that reflects orthogonality in the current time period. If , specifying DIFFPRE=(X) adds the following instruments to the difference equations:
The first row corresponds to time .
The LEVELPRE= option specifies a list of variables that are considered to be predetermined in the level equations. The LEVELPRE= option works similarly to the LEVELEND= option, except that the lag is shifted up to reflect orthogonality in the current time period. If , specifying LEVELPRE=(X) adds the following instruments to the level equations:
The first row corresponds to time .
The following code fits a dynamic panel model by using difference equations. The instrument set includes GMM-style instruments for the dependent variable Sales and GMM-style instruments that correspond to the predetermined variable Price:
proc panel data=a;
id State Year;
instruments depvar diffpre = (Price);
model Sales = Price PopDensity / dyndiff;
run;
Exogenous Variables
Exogenous variables are uncorrelated with both the level residuals and the differenced residuals. If a regression variable is exogenous, you might want to include that variable in the instrument set as a standard instrument. The DIFFEQ= option specifies a list of variables that compose the matrix of standard instruments for the difference equations; for an example of how
is formed, see the section First Differencing. These variables are usually exogenous regressors that you want to preserve under the projection to the instrument space. Because these instruments belong to the difference equations, the variables are automatically differenced.
The LEVELEQ= option specifies a list of variables that form a matrix of standard instruments that is included in the level equations. You can use this option to specify external instruments that are not part of the main regression but that can be used as instruments for the regression variables in levels.
If , specifying LEVELEQ=(X1 X2) adds the following instruments to the level equations:
The first row corresponds to time .
The following example illustrates how you would use an INSTRUMENTS statement to obtain the default set of instruments for system GMM:
proc panel data=a;
id State Year;
instruments depvar(both) constant diffeq = (Price PopDensity);
model Sales = Price PopDensity / dynsys;
run;
Limiting the Number of Instruments
Arellano and Bond’s (1991) technique of expanding instruments is a useful method of dealing with autocorrelation in the response variable. However, too many instruments can bias the estimator. The number of instruments grows quadratically with the number of time periods, making computations less feasible for larger T.
By default, PROC PANEL uses all available lags. You can limit the number of instruments by specifying the MAXBAND= option in the INSTRUMENTS statement. For example, specifying MAXBAND=5 limits the number of GMM-style instruments to five per observation, for each variable. The MAXBAND= option applies to all GMM-style instruments: those for the dependent variable, those from the DIFFEND= option, and those from the DIFFPRE= option.
A Sargan test is a referendum on your choice of instruments in a dynamic panel model. The Sargan test statistic for one-step GMM is
The Sargan test statistic for two-step GMM is
It is similarly incremented for further iterations of GMM.
The null hypothesis of the Sargan test is that the moment conditions (as defined by the columns ) hold, and thus
form an adequate set of instruments. Under the null, J is distributed as
with degrees of freedom equal to the rank of
minus the number of parameters K. The nominal rank of
is equal to the number of instruments. However, this number can be reduced because of collinearity and redundancy in the instrument specification. Furthermore, when
, the maximum rank of
is N, regardless of the number of instruments.
You should treat Sargan tests with caution when robust variances are used in the estimation. The theoretical distribution of J does not hold under conditions that favor robust variances.
An AR(m) test is a test for autocorrelation of order m in the model residuals. Let be the working variance of the residuals from the full system. The precise definition of
depends on the GMM stage and whether robust variances are specified; see Table 3.
Table 3: Definition of the Working Residual Variance
Define the residual vector
where are the residuals from the difference equations, evaluated at the final estimate of
. The trailing zeros correspond to the level equations. Define
as a lagged version of
such that the following are true:
Define the following:
The AR(m) test statistic is , where
The matrix is the estimated variance matrix of the parameters, corresponding to the GMM stage specified, and either model-based, robust, or bias-corrected.
Under the null hypothesis of no autocorrelation, follows a standard normal distribution. Because of the differencing in the errors, well-specified models present autocorrelation of order
, but any autocorrelation at higher orders indicates a violation of assumptions.