QLIM Procedure

Endogeneity and Instrumental Variables

The PROC QLIM models such as qualitative response or limited dependent variable models assume that the errors are independent of the explanatory variables. If this assumption fails to hold, the distributional form that the likelihood is based on is misspecified and the obtained coefficients are inconsistent.

To begin, consider a linear model

y Subscript i Baseline equals y Subscript i Superscript asterisk Baseline equals beta 0 plus beta 1 x Subscript 1 i Baseline plus midline-horizontal-ellipsis plus beta Subscript k Baseline x Subscript k i Baseline plus u Subscript i

Assume that , for , and . Therefore, is endogenous. The endogeneity comes from many sources, such as having measurement error or omitting a variable that is correlated with . If you ignore the endogeneity, you can estimate this model in PROC QLIM as follows (assuming ):

proc qlim data=a;
   model y = x1 x2 x3 x4;
run;

However, this approach produces inconsistent maximum likelihood estimates. To obtain consistent maximum likelihood estimates, you should consider the joint density of the dependent variable and the endogenous variables. To do this in PROC QLIM, you need at least one instrument—that is, an observable variable, —that is not in the structural equation and that satisfies two conditions: is exogenous (that is, ), and must be correlated with the endogenous regressor . Then, you can model as

x Subscript k i Baseline equals pi 0 plus pi 1 x Subscript 1 i Baseline plus midline-horizontal-ellipsis plus pi Subscript k minus 1 Baseline x Subscript left-parenthesis k minus 1 right-parenthesis i Baseline plus theta z Subscript 1 i Baseline plus epsilon Subscript i

You can now write this reduced form equation along with the structural equation to obtain the consistent maximum likelihood estimates as follows:

proc qlim data=a;
   model y = x1 x2 x3 x4;
   model x4 = x1 x2 x3 z1;
run;

Estimating the structural model together with the reduced form models for the endogenous explanatory variables gives you the full information maximum likelihood (FIML) estimates. Because of the linearity of the structural model, you can estimate it efficiently and more simply by using the two-stage least squares estimator. However, PROC QLIM handles nonlinear models such as qualitative response and limited dependent variable models, and in their estimation it maximizes the corresponding joint likelihood function (for more information and an application, see Wooldridge 2010, Section 15.7.3). In the case of endogeneity, when the reduced form models for the endogenous explanatory variables are written along with the structural model, PROC QLIM maximizes the likelihood function that is obtained from the joint density of the response variable and the endogenous explanatory variables. For example, consider the following censored regression model in which one of the explanatory variables is a continuous endogenous variable:

StartLayout 1st Row 1st Column y Subscript 1 i Superscript asterisk 2nd Column equals 3rd Column alpha y Subscript 2 i plus bold z prime Subscript 1 i Baseline bold-italic beta plus u Subscript i 2nd Row 1st Column y Subscript 2 i 2nd Column equals 3rd Column bold z prime Subscript i Baseline bold-italic pi plus epsilon Subscript i 3rd Row 1st Column y Subscript 1 i 2nd Column equals 3rd Column StartLayout Enlarged left-brace 1st Row 1st Column y Subscript 1 i Superscript asterisk 2nd Column normal i normal f y Subscript 1 i Superscript asterisk Baseline greater-than 0 2nd Row 1st Column 0 2nd Column normal i normal f y Subscript 1 i Superscript asterisk Baseline less-than-or-equal-to 0 EndLayout EndLayout

The exogenous explanatory variables are , and the continuous endogenous explanatory variable is .

The likelihood function to maximize is

upper L equals product Underscript i element-of StartSet y Subscript 1 i Baseline greater-than 0 EndSet Endscripts f left-parenthesis y Subscript 1 i Baseline comma y Subscript 2 i Baseline right-parenthesis dot product Underscript i element-of StartSet y Subscript 1 i Baseline equals 0 EndSet Endscripts integral Subscript negative normal infinity Superscript 0 Baseline f left-parenthesis y Subscript 1 i Superscript asterisk Baseline comma y Subscript 2 i Baseline right-parenthesis d y Subscript 1 i Superscript asterisk Baseline

where is the joint density of and . Note that is substituted for when . If you assume with , then, by using , you can write the likelihood function for each i as a multiplication of two parts. The first part is the probability density function of the normal distribution with mean and variance , and the second part follows a Tobit model that has latent mean and variance . Then, you can obtain the log-likelihood function by taking the log of this multiplication and summing over i (for more information, see Wooldridge 2002, Section 16.6.2). This is the log-likelihood function that PROC QLIM maximizes. The parameters that are obtained from this maximization are the FIML estimators. Assuming that the latent model includes two instrumental variables and two exogenous explanatory variables, you can estimate this model in PROC QLIM as follows:

proc qlim data=a;
   model y1 = y2 z11 z12 / censored(lb=0);
   model y2 = z11 z12 z21 z22;
run;

For simple examples like the preceding ones, you can derive the likelihood function easily. However, as the number of endogenous explanatory variables increases, if these variables have a discontinuous nature, if simultaneity among equations exists, or if a combination of these occurs, then the derivation of the likelihood function becomes cumbersome, or, in some cases, the likelihood function does not even have a closed analytical form.

PROC QLIM can handle endogeneity regardless of the nature of the endogenous explanatory variables for a single structural model. In the case of one endogenous explanatory variable, PROC QLIM reports the FIML estimates that are calculated by using the analytical likelihood function that is obtained from the joint distribution of the dependent variable and the endogenous variable. When there is more than one endogenous explanatory variable, the analytical form of the likelihood function is usually not available; in this case PROC QLIM reports the simulated maximum likelihood estimates. For the simulated maximum likelihood estimation method, PROC QLIM uses the Geweke-Hajivassiliou-Keane (GHK) simulator (see, among others, Hajivassiliou, McFadden, and Ruud 1996) to simulate the joint distribution of the dependent variable and the endogenous variables. The simulation is facilitated by assuming that the error terms in the latent models for the dependent variable and the endogenous explanatory variables are distributed as multivariate normal.

When you estimate a model in PROC QLIM, you can take the endogeneity into account by writing the structural model along with the reduced form models for each endogenous variable. Examples are provided in the following sections.

Probit Model with a Continuous Endogenous Explanatory Variable

Consider a probit model that contains a single endogenous explanatory variable in addition to two instruments and two exogenous explanatory variables. The model is