Consider the effect of age on an individual’s health self-assessment that is recorded using the values , where 0 indicates the poorest health. You can model the self-assessment outcome by an ordered probit or logit in PROC QLIM by using the option DISCRETE(D=NORMAL) or DISCRETE(D=LOGISTIC) in the MODEL or ENDOGENOUS statement.
One important shortcoming of this traditional way of modeling is the underlying assumption that, for all individuals, the explanatory variables have fixed constant coefficients. This assumption implies that the impact of the explanatory variables on the dependent variable is the same for all the individuals. However, the assumption might not be realistic, because individuals are usually heterogeneous and hence the coefficient values are expected to vary across the individual observations. In the health self-assessment example, it is expected that aging involves cognitive and physical decline, so on average the relationship between age and health is expected to be negative. However, believing that this negative relationship is the same for every individual ignores the fact that for some individuals aging brings wiser life choices, including a healthier lifestyle and improved emotional well-being, and hence even improved health. Thus, enforcing a negative relationship can cause misleading inferences for this subgroup of individuals with a positive coefficient. Similarly, the effect might be negative for every individual, but its magnitude can vary across observations. In any case, if you are modeling such a behavior, then taking into account the unobserved heterogeneity, where parameter values vary across the observations because of unobserved factors, is more likely to give you more realistic results.
Random-parameters models accommodate such a heterogeneity by allowing the coefficients to vary randomly across individuals based on some prespecified distribution, . The set of parameters
defines the unobserved heterogeneity. Therefore, the goal is to estimate those parameters to define the individual heterogeneity.
If you have panel data, you can include random parameters by using the RANDOM statement for all the single-equation models of PROC QLIM—binary probit or logit, ordered probit or logit, Tobit (censored and truncated), stochastic frontier production and cost, and linear regression models—to generalize these models further in order to obtain more realistic results. However, you do not have to have the observations collected in a panel data setting to apply random-parameters models in PROC QLIM. The random-parameters models can also be applied in cross-sectional data as long as you specify the group or subject variable across which the parameter heterogeneity occurs.
Random-parameters models allow individual heterogeneity in the coefficients in the latent process,
where is a latent variable,
is a vector of covariates, and
is the error term. In the applications for a panel data set, the subscript i represents individuals and t represents the time period.
The model assumes that parameters are randomly distributed with mean
and variance
is a positive definite matrix. If the random parameters are not correlated with one another, then
becomes a diagonal matrix. Let
be the Cholesky factorization of the covariance matrix of the random parameters,
. In other words,
is the lower triangular matrix that produces
. By construction,
where is a random vector with zero means and unit standard deviations. In the no-correlation case,
is also a diagonal matrix with the standard deviations of
on the diagonal.
PROC QLIM assumes that are normally distributed; hence
is normally distributed with mean vector
and covariance matrix
.
Some of the explanatory variables in the latent model might have fixed (nonrandom) coefficients. In this case can be written conveniently as
where is the vector of nonrandom (fixed) coefficients and
is the vector of the means of the random coefficients.
The general form of the conditional density for the observed response can be written as
where is the parameter vector that includes the elements of
and
; the standard deviation of
,
; and other parameters specified by the model.
The joint density for the ith group conditional on and
is
Because is unobserved, it is necessary to obtain the unconditional likelihood by taking the expectation of this likelihood over the distribution of
. Thus
where is the probability density function of
. Under the normality assumption,
, where
is the probability density function of the standard normal distribution. The true log-likelihood function is obtained by summing
, the log of the contribution of the ith individual to the total, over the individuals:
The integral in the square brackets does not have a closed form, so it is difficult to perform maximum likelihood estimation. However, this integration can be approximated and likelihood estimation is still possible. The subsection Estimation discusses various methods of approximation for this integral.
The nature of the dependent variable specifies the log-likelihood function. For example, if the dependent variable is binary and its probability is defined by a normal distribution (a probit model), then
where is the cumulative density function of the standard normal distribution. If the dependent variable is modeled by a logit, then
where is the cumulative density function of the standard logistic distribution.
The likelihood function is maximized by solving the likelihood equations
These derivatives involve integration. The integration is approximated by the same method that is used to calculate the likelihood.
When you use one of the simulation methods that are described in the subsections Monte Carlo Integration and QMC Method Using the Halton Sequence, the log likelihood to be optimized becomes
The general formulation of the gradients is
The formulation of the derivatives with respect to each type of parameter differs from model to model.
Note that includes the elements of
rather than
. That is, the optimization is performed with respect to elements of
. Therefore, when you use the ITPRINT option, the resulting output is based on the parameters that construct the lower triangular matrix from the Cholesky factorization of the covariance matrix of the random parameters. These parameters are labeled starting with _CHOL. For example, if two of the explanatory variables,
and
, in your model have random coefficients, then the parameters that construct the diagonal of
are _CHOL.x1.x1 and _CHOL.x2.x2 and the lower part of
is _CHOL.x1.x2. If you use the NOCORR option, then the optimization is based on only the diagonal elements of
, and in this case _CHOL.x1.x1 and _CHOL.x2.x2 are the standard deviations of the coefficients of
and
, respectively. Although the optimization is performed with respect to
, which includes the elements of
rather than
, the results are transformed to obtain the elements of
and their corresponding standard errors.
Random-effects models are a special case in which only the constant term is random. For these models, the parameter heterogeneity across individuals can be formulated as
where has mean 0 and variance
.
In most applications of random-effects models, this type of parameter heterogeneity is modeled as a group-specific unobservable heterogeneity in the error term as
where
The density of an observed random variable, , is
The density of the group-specific heterogeneity is
For example, in the case of a random-effects Tobit model, is specified as
where
where contains
for all t and
consists of
and
. Therefore, for this model,
and
where is the cumulative density function of the standard normal distribution,
is the probability density function of the standard normal distribution, and
is the indicator function.
For random-effects models, the unobserved component, , must be integrated out in order to form the likelihood function for the observed data. For individual i,
Therefore, the log-likelihood function for the observed data becomes
The notation for the likelihood function of a random-effects model is not much different from that of the random-parameters model discussed in the section General Models with Random Parameters. However, there is a substantial difference in the formulation of the likelihood function of the random-parameters model. The integration in is a multidimensional integral. More specifically, if the number of random parameters is K, then it is a K-dimensional integral.
The integral in the log-likelihood function for random-parameters models does not have a closed form; that is, it is difficult to integrate out the random parameters. However, the integral can be approximated, and the usual likelihood estimation can be pursued based on the approximated log-likelihood function. PROC QLIM offers three methods of approximation: Monte Carlo (MC) integration, the quasi–Monte Carlo (QMC) method using the Halton sequences, and approximation by Hermite quadrature. The first two methods are simulation methods, and hence the likelihood method based on the resulting simulated log-likelihood function is called the simulated maximum likelihood. The third method fails to provide a good approximation when the dimensionality of the random parameters, K, is high. The Hermite quadrature method can be used only for random-effects models or random-parameters models that have a single random coefficient (that is, ).
Consider the random-effects model defined in the section Random-Effects Models. First, note that
The function is smooth, continuous, and continuously differentiable. By the law of large numbers, if is a sample of iid draws from
, then
This operation is implemented by simulation that uses a random number generator. PROC QLIM inserts the simulated integral in the log likelihood to obtain the simulated log likelihood
and maximizes the simulated log likelihood with respect to the parameter set that includes
and
.
Under certain assumptions (Greene 2001), the simulated likelihood estimator and the maximum likelihood estimator are equivalent. For this equivalence result to hold, the number of draws, R, must increase faster than the number of observations, N. For this reason, if the NDRAW= option is not specified, then by default, it is tied to the sample size by using the rule , where
.
Generalization of the log-likelihood function for random-parameters models is
where
In this more general case, is the rth K-variate vector of random draws for individual i. The random draws come from the distribution with the probability density function
. PROC QLIM specifies
as the probability density function of the standard normal distribution.
The use of independent random draws in simulation is conceptually straightforward, and the statistical properties of the simulated maximum likelihood estimator are easy to derive. However, simulation is a very computationally intensive technique. Moreover, the simulation method itself contributes to the variation of the simulated maximum likelihood estimator (see, for example, Geweke 1995). There are other ways to take draws that can provide greater accuracy by covering the domain of the integral more uniformly and by lowering the simulation variance (Train 2009, section 9.3). For example, quasi–Monte Carlo methods are based on an integration technique that replaces the pseudorandom draws of MC integration with a sequence of judiciously selected nonrandom points that provide more uniform coverage of the domain of the integral. Therefore, the advantage of QMC integration over MC integration is that for some types of sequences, the accuracy is far greater, convergence is much faster, and the simulation variance is smaller. QMC methods are surveyed in Bhat (2001), Sloan and Woźniakowski (1998), and Morokoff and Caflisch (1995). In addition to MC simulation, PROC QLIM offers the QMC integration method that uses Halton sequences.
Halton sequences (Halton 1960) provide uniform coverage for each observation’s integral, and they decrease the simulation variance by inducing a negative correlation over the draws for each observation. A Halton sequence is constructed deterministically in terms of a prime number as its base. For example, the following sequence is the Halton sequence for 2:
For more information about how to generate a Halton sequence, see Train (2009), section 9.3.3.
If you use the QMC method, first, K Halton sequences are created—that is, one Halton sequence for each random parameter, with each sequence corresponding to a different prime number between 2 and the Kth prime number. Then for each sequence, part of the sequence (or the whole sequence, depending on whether you decide to discard the initial elements of the sequences[13]) is used in groups. For a given sequence, each group of consequent elements constitutes the "draws" for each cross-sectional observation. This way, each sub-sequence fills in the gaps for the previous sub-sequences, and the draws for one observation tend to be negatively correlated with those for the previous observation.
When the number of draws that are used for each observation rises, the coverage for each observation improves. This improvement in turn improves the accuracy; however, the negative covariance across observations diminishes. Because Halton draws are far more effective than random draws in Monte Carlo simulation, a small number of Halton draws provide relatively good integration (Spanier and Maize 1991).
The Halton draws are for a uniform density. PROC QLIM obtains by evaluating the inverse cumulative standard normal density for each element of the rth K-variate draw for the ith group.
Consider the random-effects model that is defined in the section Random-Effects Models. This method is the Butler and Moffitt (1982) approach, which is based on models in which has a normal distribution. If
is normally distributed with zero mean, then
Let . Then
and
. Making the change of variable and letting the error effects be additive produce
This likelihood function is in a form that can be approximated accurately by using Gauss-Hermite quadrature, which eliminates the integration. Thus, the log-likelihood function can be approximated with
where and
are the weights and nodes for the Hermite quadrature of degree H. PROC QLIM maximizes
when the Hermite quadrature option is specified (METHOD=HERMITE in the RANDOM statement).
[13] When sequences are created in multiple dimensions, the initial part of the series is usually eliminated because the initial terms of multiple Halton sequences are highly correlated. However, there is no such correlation for a single dimension.