(View the complete code for this example.)
Consider a dynamic panel demand model for cigarette sales that illustrates the methods described in the section Dynamic Panel Estimation (DYNDIFF and DYNSYS Options). The data are from a panel of 46 American states over the period 1963–1992. The dependent variable is the logarithm of per capita cigarette sales (variable LSales). Other factors that were measured include the log of price (LPrice), the log of disposable income (LDisp), and the log of minimum price in adjoining states (LMin). For a full description of the data, see Baltagi (2013, sec. 8.9).
The following statements create the Cigar data set:
data Cigar;
input State Year Price Pop Pop_16 Cpi Disp Sales Min;
LSales = log(Sales);
LPrice = log(Price);
LDisp = log(Disp);
LMin = log(Min);
label
State = 'State abbreviation'
Year = 'Year'
LSales = 'Log cigarette sales in packs per capita'
LPrice = 'Log price per pack of cigarettes'
LDisp = 'Log per capita disposable income'
LMin = 'Log minimum price in adjoining states per pack of cigarettes';
datalines;
1 63 28.6 3383 2236.5 30.6 1558.3045298 93.9 26.1
1 64 29.8 3431 2276.7 31.0 1684.0732025 95.4 27.5
1 65 29.8 3486 2327.5 31.5 1809.8418752 98.5 28.9
1 66 31.5 3524 2369.7 32.4 1915.1603572 96.4 29.5
1 67 31.6 3533 2393.7 33.4 2023.5463678 95.5 29.6
1 68 35.6 3522 2405.2 34.8 2202.4855362 88.4 32
1 69 36.6 3531 2411.9 36.7 2377.3346665 90.1 32.8
1 70 39.6 3444 2394.6 38.8 2591.0391591 89.8 34.3
1 71 42.7 3481 2443.5 40.5 2785.3159706 95.4 35.8
... more lines ...
You posit a panel model for cigarette sales that contains fixed effects for states. Because you believe that the data are insufficient to explain all possible shocks in yearly sales, you include lagged sales in the model as a regressor. By construction, lagged sales are an endogenous regressor, and you thus specify dynamic panel estimation by using the DYNDIFF option. The following statements fit the model:
proc sort data=Cigar;
by State Year;
run;
proc panel data=Cigar;
id State Year;
model LSales = LPrice LDisp LMin / dyndiff;
run;
The results are shown in Output 25.5.1. Note that it was not necessary to explicitly include lagged sales on the right-hand side of the model; PROC PANEL generated it for you. The coefficient on lagged sales is 0.732, indicating a high degree of autocorrelation in the dependent variable. When cigarette sales are unusually high or low because of unforeseen circumstances, the effects tend to linger for several years. The results also show that demand is highly elastic to price.
Output 25.5.1: Dynamic Panel Estimation for Cigarette Sales
| Model Description | |
|---|---|
| Estimation Method | DynDiff |
| Number of Cross Sections | 46 |
| Time Series Length | 30 |
| GMM Stage | 1 |
| GMM Bandwidth | 30 |
| Number of Instruments | 410 |
| Variance Estimation | GMM |
| Fit Statistics | |||
|---|---|---|---|
| SSE | 3.1373 | DFE | 1283 |
| MSE | 0.0024 | Root MSE | 0.0494 |
| Sargan Test | ||
|---|---|---|
| DF | Statistic | Prob > ChiSq |
| 405 | 712.45 | <.0001 |
| Parameter Estimates | ||||||
|---|---|---|---|---|---|---|
| Variable | DF | Estimate | Standard Error |
t Value | Pr > |t| | Label |
| Intercept | 1 | 0.769092 | 0.0658 | 11.69 | <.0001 | Intercept |
| LSales (Lag 1) | 1 | 0.732212 | 0.0178 | 41.07 | <.0001 | Log cigarette sales in packs per capita, Lag 1 |
| LPrice | 1 | -0.26328 | 0.0255 | -10.31 | <.0001 | Log price per pack of cigarettes |
| LDisp | 1 | 0.166116 | 0.0105 | 15.88 | <.0001 | Log per capita disposable income |
| LMin | 1 | 0.032726 | 0.0233 | 1.40 | 0.1604 | Log minimum price in adjoining states per pack of cigarettes |
| AR(m) Test | ||
|---|---|---|
| Lag | Statistic | Pr > |Statistic| |
| 1 | -15.44 | <.0001 |
| 2 | 2.47 | 0.0134 |
Included in Output 25.5.1 are two diagnostic measures. The first, a Sargan test, is a test of the validity of the moment conditions that are conferred by the GMM instruments that were used. The p-value indicates that the moment conditions are not valid and that you should probably look for a set of instruments other than the default set provided by PROC PANEL.
The second diagnostic test is the AR(m) test for autocorrelation in the residuals. In well-fitting dynamic panel models, you expect to see some autocorrelation of lag 1, but any autocorrelation at higher lags indicates a poor fit. The autocorrelation at lag 2 is significant, leading you to seek a better-fitting alternative.
One possible explanation for the poor fit is that, by default, PROC PANEL uses the one-step generalized method of moments (GMM). One-step GMM is known for being too reliant on the assumption that the residuals from the difference equations are not serially correlated. An alternative is two-step GMM, which instead uses a data-driven variance matrix for the differenced residuals.
The following statements fit the model by two-step GMM:
proc panel data=Cigar;
id State Year;
instruments constant depvar diffeq=(LPrice LDisp LMin);
model lSales = LPrice LDisp LMin / dyndiff gmm2 biascorrected;
run;
The code includes an INSTRUMENTS statement that, for demonstration purposes, reproduces the default instrument set. That set includes the following:
a constant (keyword CONSTANT)
GMM-style instruments based on the dependent variable, LSales (keyword DEPVAR)
standard instruments for the exogenous regressors LPrice, LDisp, and LMin (DIFFEQ= option)
The code also includes the BIASCORRECTED option, which produces bias-corrected standard errors according to the method of Windmeijer (2005).
The results are shown in Output 25.5.2. The coefficients do not change much, but the standard errors are now more reliable. The model diagnostic tests indicate a better fit, although you should use caution when interpreting Sargan test results. Sargan tests lack power when the number of instruments is large, and their distributional properties come into question under conditions that favor either robust or bias-corrected standard errors.
Output 25.5.2: Dynamic Panel Estimation by Two-Step GMM
| Model Description | |
|---|---|
| Estimation Method | DynDiff |
| Number of Cross Sections | 46 |
| Time Series Length | 30 |
| GMM Stage | 2 |
| GMM Bandwidth | 30 |
| Number of Instruments | 410 |
| Variance Estimation | Bias-corrected |
| Fit Statistics | |||
|---|---|---|---|
| SSE | 3.1348 | DFE | 1283 |
| MSE | 0.0024 | Root MSE | 0.0494 |
| Sargan Test | ||
|---|---|---|
| DF | Statistic | Prob > ChiSq |
| 41 | 45.45 | 0.2920 |
| Parameter Estimates | ||||||
|---|---|---|---|---|---|---|
| Variable | DF | Estimate | Standard Error |
t Value | Pr > |t| | Label |
| Intercept | 1 | 0.770726 | 0.1538 | 5.01 | <.0001 | Intercept |
| LSales (Lag 1) | 1 | 0.730839 | 0.0523 | 13.97 | <.0001 | Log cigarette sales in packs per capita, Lag 1 |
| LPrice | 1 | -0.25942 | 0.0418 | -6.21 | <.0001 | Log price per pack of cigarettes |
| LDisp | 1 | 0.166895 | 0.0266 | 6.27 | <.0001 | Log per capita disposable income |
| LMin | 1 | 0.028106 | 0.0410 | 0.69 | 0.4934 | Log minimum price in adjoining states per pack of cigarettes |
| AR(m) Test | ||
|---|---|---|
| Lag | Statistic | Pr > |Statistic| |
| 1 | -4.97 | <.0001 |
| 2 | 1.89 | 0.0587 |
The previous estimation treats regressors such as LPrice as exogenous. If you believe that price is endogenous, you can create GMM-style instruments for LPrice to replace the default standard instruments.
The following statements fit the model by using GMM-style instruments for LPrice:
proc panel data=Cigar;
id State Year;
instruments constant depvar diffeq=(LDisp LMin) diffend=(LPrice);
model lSales = LPrice LDisp LMin / dyndiff gmm2 biascorrected;
run;
The results are shown in Output 25.5.3. Treating LPrice as endogenous greatly increases the number of instruments. Although this is not the case here, when the number of instruments is so large that it makes estimation infeasible, you can limit the number of instruments by specifying the MAXBAND= option in the INSTRUMENTS statement.
Output 25.5.3: Dynamic Panel Estimation, Custom Instrument Set
| Model Description | |
|---|---|
| Estimation Method | DynDiff |
| Number of Cross Sections | 46 |
| Time Series Length | 30 |
| GMM Stage | 2 |
| GMM Bandwidth | 30 |
| Number of Instruments | 815 |
| Variance Estimation | Bias-corrected |
| Fit Statistics | |||
|---|---|---|---|
| SSE | 3.4193 | DFE | 1283 |
| MSE | 0.0027 | Root MSE | 0.0516 |
| Sargan Test | ||
|---|---|---|
| DF | Statistic | Prob > ChiSq |
| 41 | 45.48 | 0.2909 |
| Parameter Estimates | ||||||
|---|---|---|---|---|---|---|
| Variable | DF | Estimate | Standard Error |
t Value | Pr > |t| | Label |
| Intercept | 1 | 0.510851 | 0.1232 | 4.15 | <.0001 | Intercept |
| LSales (Lag 1) | 1 | 0.8046 | 0.0410 | 19.63 | <.0001 | Log cigarette sales in packs per capita, Lag 1 |
| LPrice | 1 | -0.21878 | 0.0397 | -5.51 | <.0001 | Log price per pack of cigarettes |
| LDisp | 1 | 0.138748 | 0.0208 | 6.66 | <.0001 | Log per capita disposable income |
| LMin | 1 | 0.024775 | 0.0407 | 0.61 | 0.5427 | Log minimum price in adjoining states per pack of cigarettes |
| AR(m) Test | ||
|---|---|---|
| Lag | Statistic | Pr > |Statistic| |
| 1 | -5.04 | <.0001 |
| 2 | 1.95 | 0.0509 |