PANEL Procedure

Example 25.5 Cigarette Sales Data: Dynamic Panel Estimation

(View the complete code for this example.)

Consider a dynamic panel demand model for cigarette sales that illustrates the methods described in the section Dynamic Panel Estimation (DYNDIFF and DYNSYS Options). The data are from a panel of 46 American states over the period 1963–1992. The dependent variable is the logarithm of per capita cigarette sales (variable LSales). Other factors that were measured include the log of price (LPrice), the log of disposable income (LDisp), and the log of minimum price in adjoining states (LMin). For a full description of the data, see Baltagi (2013, sec. 8.9).

The following statements create the Cigar data set:

data Cigar;
   input State Year Price Pop Pop_16 Cpi Disp Sales Min;
   LSales = log(Sales);
   LPrice = log(Price);
   LDisp  = log(Disp);
   LMin   = log(Min);
   label
   State   = 'State abbreviation'
   Year    = 'Year'
   LSales  = 'Log cigarette sales in packs per capita'
   LPrice  = 'Log price per pack of cigarettes'
   LDisp   = 'Log per capita disposable income'
   LMin    = 'Log minimum price in adjoining states per pack of cigarettes';
datalines;
1 63 28.6 3383 2236.5 30.6 1558.3045298 93.9 26.1
1 64 29.8 3431 2276.7 31.0 1684.0732025 95.4 27.5
1 65 29.8 3486 2327.5 31.5 1809.8418752 98.5 28.9
1 66 31.5 3524 2369.7 32.4 1915.1603572 96.4 29.5
1 67 31.6 3533 2393.7 33.4 2023.5463678 95.5 29.6
1 68 35.6 3522 2405.2 34.8 2202.4855362 88.4 32
1 69 36.6 3531 2411.9 36.7 2377.3346665 90.1 32.8
1 70 39.6 3444 2394.6 38.8 2591.0391591 89.8 34.3
1 71 42.7 3481 2443.5 40.5 2785.3159706 95.4 35.8

   ... more lines ...   

You posit a panel model for cigarette sales that contains fixed effects for states. Because you believe that the data are insufficient to explain all possible shocks in yearly sales, you include lagged sales in the model as a regressor. By construction, lagged sales are an endogenous regressor, and you thus specify dynamic panel estimation by using the DYNDIFF option. The following statements fit the model:

proc sort data=Cigar;
   by State Year;
run;

proc panel data=Cigar;
   id State Year;
   model LSales = LPrice LDisp LMin / dyndiff;
run;

The results are shown in Output 25.5.1. Note that it was not necessary to explicitly include lagged sales on the right-hand side of the model; PROC PANEL generated it for you. The coefficient on lagged sales is 0.732, indicating a high degree of autocorrelation in the dependent variable. When cigarette sales are unusually high or low because of unforeseen circumstances, the effects tend to linger for several years. The results also show that demand is highly elastic to price.

Output 25.5.1: Dynamic Panel Estimation for Cigarette Sales

The PANEL Procedure
Dynamic Panel Estimation by First-Differences GMM
 
Dependent Variable: LSales (Log cigarette sales in packs per capita)

Model Description
Estimation Method DynDiff
Number of Cross Sections 46
Time Series Length 30
GMM Stage 1
GMM Bandwidth 30
Number of Instruments 410
Variance Estimation GMM

Fit Statistics
SSE 3.1373 DFE 1283
MSE 0.0024 Root MSE 0.0494

Sargan Test
DF Statistic Prob > ChiSq
405 712.45 <.0001

Parameter Estimates
Variable DF Estimate Standard
Error
t Value Pr > |t| Label
Intercept 1 0.769092 0.0658 11.69 <.0001 Intercept
LSales (Lag 1) 1 0.732212 0.0178 41.07 <.0001 Log cigarette sales in packs per capita, Lag 1
LPrice 1 -0.26328 0.0255 -10.31 <.0001 Log price per pack of cigarettes
LDisp 1 0.166116 0.0105 15.88 <.0001 Log per capita disposable income
LMin 1 0.032726 0.0233 1.40 0.1604 Log minimum price in adjoining states per pack of cigarettes

AR(m) Test
Lag Statistic Pr > |Statistic|
1 -15.44 <.0001
2 2.47 0.0134


Included in Output 25.5.1 are two diagnostic measures. The first, a Sargan test, is a test of the validity of the moment conditions that are conferred by the GMM instruments that were used. The p-value indicates that the moment conditions are not valid and that you should probably look for a set of instruments other than the default set provided by PROC PANEL.

The second diagnostic test is the AR(m) test for autocorrelation in the residuals. In well-fitting dynamic panel models, you expect to see some autocorrelation of lag 1, but any autocorrelation at higher lags indicates a poor fit. The autocorrelation at lag 2 is significant, leading you to seek a better-fitting alternative.

One possible explanation for the poor fit is that, by default, PROC PANEL uses the one-step generalized method of moments (GMM). One-step GMM is known for being too reliant on the assumption that the residuals from the difference equations are not serially correlated. An alternative is two-step GMM, which instead uses a data-driven variance matrix for the differenced residuals.

The following statements fit the model by two-step GMM:

proc panel data=Cigar;
   id State Year;
   instruments constant depvar diffeq=(LPrice LDisp LMin);
   model lSales = LPrice LDisp LMin / dyndiff gmm2 biascorrected;
run;

The code includes an INSTRUMENTS statement that, for demonstration purposes, reproduces the default instrument set. That set includes the following:

  • a constant (keyword CONSTANT)

  • GMM-style instruments based on the dependent variable, LSales (keyword DEPVAR)

  • standard instruments for the exogenous regressors LPrice, LDisp, and LMin (DIFFEQ= option)

The code also includes the BIASCORRECTED option, which produces bias-corrected standard errors according to the method of Windmeijer (2005).

The results are shown in Output 25.5.2. The coefficients do not change much, but the standard errors are now more reliable. The model diagnostic tests indicate a better fit, although you should use caution when interpreting Sargan test results. Sargan tests lack power when the number of instruments is large, and their distributional properties come into question under conditions that favor either robust or bias-corrected standard errors.

Output 25.5.2: Dynamic Panel Estimation by Two-Step GMM

The PANEL Procedure
Dynamic Panel Estimation by First-Differences GMM
 
Dependent Variable: LSales (Log cigarette sales in packs per capita)

Model Description
Estimation Method DynDiff
Number of Cross Sections 46
Time Series Length 30
GMM Stage 2
GMM Bandwidth 30
Number of Instruments 410
Variance Estimation Bias-corrected

Fit Statistics
SSE 3.1348 DFE 1283
MSE 0.0024 Root MSE 0.0494

Sargan Test
DF Statistic Prob > ChiSq
41 45.45 0.2920

Parameter Estimates
Variable DF Estimate Standard
Error
t Value Pr > |t| Label
Intercept 1 0.770726 0.1538 5.01 <.0001 Intercept
LSales (Lag 1) 1 0.730839 0.0523 13.97 <.0001 Log cigarette sales in packs per capita, Lag 1
LPrice 1 -0.25942 0.0418 -6.21 <.0001 Log price per pack of cigarettes
LDisp 1 0.166895 0.0266 6.27 <.0001 Log per capita disposable income
LMin 1 0.028106 0.0410 0.69 0.4934 Log minimum price in adjoining states per pack of cigarettes

AR(m) Test
Lag Statistic Pr > |Statistic|
1 -4.97 <.0001
2 1.89 0.0587


The previous estimation treats regressors such as LPrice as exogenous. If you believe that price is endogenous, you can create GMM-style instruments for LPrice to replace the default standard instruments.

The following statements fit the model by using GMM-style instruments for LPrice:

proc panel data=Cigar;
   id State Year;
   instruments constant depvar diffeq=(LDisp LMin) diffend=(LPrice);
   model lSales = LPrice LDisp LMin / dyndiff gmm2 biascorrected;
run;

The results are shown in Output 25.5.3. Treating LPrice as endogenous greatly increases the number of instruments. Although this is not the case here, when the number of instruments is so large that it makes estimation infeasible, you can limit the number of instruments by specifying the MAXBAND= option in the INSTRUMENTS statement.

Output 25.5.3: Dynamic Panel Estimation, Custom Instrument Set

The PANEL Procedure
Dynamic Panel Estimation by First-Differences GMM
 
Dependent Variable: LSales (Log cigarette sales in packs per capita)

Model Description
Estimation Method DynDiff
Number of Cross Sections 46
Time Series Length 30
GMM Stage 2
GMM Bandwidth 30
Number of Instruments 815
Variance Estimation Bias-corrected

Fit Statistics
SSE 3.4193 DFE 1283
MSE 0.0027 Root MSE 0.0516

Sargan Test
DF Statistic Prob > ChiSq
41 45.48 0.2909

Parameter Estimates
Variable DF Estimate Standard
Error
t Value Pr > |t| Label
Intercept 1 0.510851 0.1232 4.15 <.0001 Intercept
LSales (Lag 1) 1 0.8046 0.0410 19.63 <.0001 Log cigarette sales in packs per capita, Lag 1
LPrice 1 -0.21878 0.0397 -5.51 <.0001 Log price per pack of cigarettes
LDisp 1 0.138748 0.0208 6.66 <.0001 Log per capita disposable income
LMin 1 0.024775 0.0407 0.61 0.5427 Log minimum price in adjoining states per pack of cigarettes

AR(m) Test
Lag Statistic Pr > |Statistic|
1 -5.04 <.0001
2 1.95 0.0509


Last updated: June 19, 2025