MODEL Procedure

Estimation Methods

Consider the general nonlinear model:

StartLayout 1st Row 1st Column bold-italic epsilon Subscript t 2nd Column equals 3rd Column bold q left-parenthesis bold y Subscript t Baseline comma bold x Subscript t Baseline comma bold-italic theta right-parenthesis 2nd Row 1st Column bold z Subscript t 2nd Column equals 3rd Column upper Z left-parenthesis bold x Subscript t Baseline right-parenthesis EndLayout

where q is a real vector valued function of y, x, , where g is the number of equations, l is the number of exogenous variables (lagged endogenous variables are considered exogenous here), p is the number of parameters, and t ranges from 1 to n. is a vector of instruments. is an unobservable disturbance vector with the following properties:

StartLayout 1st Row 1st Column upper E left-parenthesis bold-italic epsilon Subscript t Baseline right-parenthesis 2nd Column equals 3rd Column 0 2nd Row 1st Column upper E left-parenthesis bold-italic epsilon Subscript t Baseline bold-italic epsilon Subscript t Superscript prime Baseline right-parenthesis 2nd Column equals 3rd Column bold upper Sigma EndLayout

All of the methods implemented in PROC MODEL aim to minimize an objective function. Table 2 summarizes the objective functions that define the estimators and the corresponding estimator of the covariance of the parameter estimates for each method.

Table 2: Summary of PROC MODEL Estimation Methods

Method	Instruments	Objective Function	Covariance of
OLS	No
ITOLS	No
SUR	No
ITSUR	No
N2SLS	Yes
IT2SLS	Yes
N3SLS	Yes
IT3SLS	Yes
GMM	Yes
ITGMM	Yes
FIML	No

The Instruments column identifies the estimation methods that require instruments. The variables used in this table and the remainder of this chapter are defined as follows:

n is the number of nonmissing observations.

g is the number of equations.

k is the number of instrumental variables.

is the vector of residuals for the g equations stacked together.

is the column vector of residuals for the ith equation.

S

is a matrix that estimates , the covariances of the errors across equations (referred to as the S matrix).

X

is an matrix of partial derivatives of the residual with respect to the parameters.

W

is an matrix, .

Z

is an matrix of instruments.

Y

is a matrix of instruments. .

is an matrix. is a column vector obtained from stacking the columns of

StartLayout 1st Row bold upper U StartFraction 1 Over n EndFraction sigma-summation Underscript t equals 1 Overscript n Endscripts left-parenthesis StartFraction partial-differential bold q left-parenthesis bold y Subscript t Baseline comma bold x Subscript t Baseline comma bold-italic theta right-parenthesis prime Over partial-differential y Subscript t Baseline EndFraction right-parenthesis Superscript negative 1 Baseline StartFraction partial-differential squared bold q left-parenthesis bold y Subscript t Baseline comma bold x Subscript t Baseline comma bold-italic theta right-parenthesis prime Over partial-differential y Subscript t Baseline partial-differential theta Subscript i Baseline EndFraction minus bold upper Q Subscript i EndLayout

U

is an matrix of residual errors. .

Q

is the matrix .

Q

is an matrix .

I

is an identity matrix.

J

is , which is a Jacobian matrix.

is first moment of the crossproduct ,

z

is a column vector of instruments for observation . is also the th row of Z.

is the matrix that represents the variance of the moment functions.

k

is the number of instrumental variables used.

constant

is the constant .

is the notation for a Kronecker product.

All vectors are column vectors unless otherwise noted. Other estimates of the covariance matrix for FIML are also available.

Dependent Regressors and Two-Stage Least Squares

Ordinary regression analysis is based on several assumptions. A key assumption is that the independent variables are in fact statistically independent of the unobserved error component of the model. If this assumption is not true (if the regressor varies systematically with the error), then ordinary regression produces inconsistent results. The parameter estimates are biased.

Regressors might fail to be independent variables because they are dependent variables in a larger simultaneous system. For this reason, the problem of dependent regressors is often called simultaneous equation bias. For example, consider the following two-equation system:

y 1 equals a 1 plus b 1 y 2 plus c 1 x 1 plus epsilon 1

y 2 equals a 2 plus b 2 y 1 plus c 2 x 2 plus epsilon 2

In the first equation, is a dependent, or endogenous, variable. As shown by the second equation, is a function of , which by the first equation is a function of , and therefore depends on . Likewise, depends on and is a dependent regressor in the second equation. This is an example of a simultaneous equation system; and are a function of all the variables in the system.

Using the ordinary least squares (OLS) estimation method to estimate these equations produces biased estimates. One solution to this problem is to replace and on the right-hand side of the equations with predicted values, thus changing the regression problem to the following:

y 1 equals a 1 plus b 1 ModifyingAbove y With caret Subscript 2 Baseline plus c 1 x 1 plus epsilon 1

y 2 equals a 2 plus b 2 ModifyingAbove y With caret Subscript 1 Baseline plus c 2 x 2 plus epsilon 2

This method requires estimating the predicted values and through a preliminary, or "first stage," instrumental regression. An instrumental regression is a regression of the dependent regressors on a set of instrumental variables, which can be any independent variables useful for predicting the dependent regressors. In this example, the equations are linear and the exogenous variables for the whole system are known. Thus, the best choice for instruments (of the variables in the model) are the variables and .

This method is known as two-stage least squares or 2SLS, or more generally as the instrumental variables method. The 2SLS method for linear models is discussed in Pindyck and Rubinfeld (1981, pp. 191–192). For nonlinear models this situation is more complex, but the idea is the same. In nonlinear 2SLS, the derivatives of the model with respect to the parameters are replaced with predicted values. For further discussion of the use of instrumental variables in nonlinear regression, see the section Choice of Instruments.

To perform nonlinear 2SLS estimation with PROC MODEL, specify the instrumental variables with an INSTRUMENTS statement and specify the 2SLS or N2SLS option in the FIT statement. The following statements show how to estimate the first equation in the preceding example with PROC MODEL:

proc model data=in;
   y1 = a1 + b1 * y2 + c1 * x1;
   fit y1 / 2sls;
   instruments x1 x2;
run;

The 2SLS or instrumental variables estimator can be computed by using a first-stage regression on the instrumental variables as described previously. However, PROC MODEL actually uses the equivalent but computationally more appropriate technique of projecting the regression problem into the linear space defined by the instruments. Thus, PROC MODEL does not produce any "first stage" results when you use 2SLS. If you specify the FSRSQ option in the FIT statement, PROC MODEL prints "First-Stage " statistic for each parameter estimate.

Formally, the that minimizes

StartLayout 1st Row ModifyingAbove upper S With caret Subscript n Baseline equals StartFraction 1 Over n EndFraction left-parenthesis sigma-summation Underscript t equals 1 Overscript n Endscripts left-parenthesis bold q left-parenthesis bold y Subscript t Baseline comma bold x Subscript t Baseline comma theta right-parenthesis circled-times bold z Subscript t Baseline right-parenthesis right-parenthesis prime left-parenthesis sigma-summation Underscript t equals 1 Overscript n Endscripts upper I circled-times bold z Subscript t Baseline bold z prime Subscript t right-parenthesis Superscript negative 1 Baseline left-parenthesis sigma-summation Underscript t equals 1 Overscript n Endscripts left-parenthesis bold q left-parenthesis bold y Subscript t Baseline comma bold x Subscript t Baseline comma bold-italic theta right-parenthesis circled-times bold z Subscript t Baseline right-parenthesis right-parenthesis EndLayout

is the N2SLS estimator of the parameters. The estimate of at the final iteration is used in the covariance of the parameters given in Table 2. For more information about the properties of nonlinear two-stage least squares, see Amemiya (1985, p. 250).

Seemingly Unrelated Regression

If the regression equations are not simultaneous (so there are no dependent regressors), seemingly unrelated regression (SUR) can be used to estimate systems of equations with correlated random errors. The large-sample efficiency of an estimation can be improved if these cross-equation correlations are taken into account. SUR is also known as joint generalized least squares or Zellner regression. Formally, the that minimizes

ModifyingAbove upper S With caret Subscript n Baseline equals StartFraction 1 Over n EndFraction sigma-summation Underscript t equals 1 Overscript n Endscripts bold q left-parenthesis bold y Subscript t Baseline comma bold x Subscript t Baseline comma bold-italic theta right-parenthesis prime ModifyingAbove bold upper Sigma With caret Superscript negative 1 Baseline bold q left-parenthesis bold y Subscript t Baseline comma bold x Subscript t Baseline comma bold-italic theta right-parenthesis

is the SUR estimator of the parameters.

The SUR method requires an estimate of the cross-equation covariance matrix, . PROC MODEL first performs an OLS estimation, computes an estimate, , from the OLS residuals, and then performs the SUR estimation based on . The OLS results are not printed unless you specify the OLS option in addition to the SUR option.

You can specify the to use for SUR by storing the matrix in a SAS data set and naming that data set in the SDATA= option. You can also feed the computed from the SUR residuals back into the SUR estimation process by specifying the ITSUR option. You can print the estimated covariance matrix by using the COVS option in the FIT statement.

The SUR method requires estimation of the matrix, and this increases the sampling variability of the estimator for small sample sizes. The efficiency gain that SUR has over OLS is a large sample property, and you must have a reasonable amount of data to realize this gain. For a more detailed discussion of SUR, see Pindyck and Rubinfeld (1981, pp. 331–333).

Three-Stage Least Squares Estimation

If the equation system is simultaneous, you can combine the 2SLS and SUR methods to take into account both dependent regressors and cross-equation correlation of the errors. This is called three-stage least squares (3SLS).

Formally, the that minimizes

is the 3SLS estimator of the parameters. For more information about 3SLS, see Gallant (1987, p. 435).

Residuals from the 2SLS method are used to estimate the matrix required for 3SLS. The results of the preliminary 2SLS step are not printed unless the 2SLS option is also specified.

To use the three-stage least squares method, specify an INSTRUMENTS statement and use the 3SLS or N3SLS option in either the PROC MODEL statement or a FIT statement.

Generalized Method of Moments (GMM)

For systems of equations with heteroscedastic errors, generalized method of moments (GMM) can be used to obtain efficient estimates of the parameters. For alternatives to GMM, see the section Heteroscedasticity.

Consider the nonlinear model

where is a vector of instruments and is an unobservable disturbance vector that can be serially correlated and nonstationary.

In general, the following orthogonality condition is desired:

upper E left-parenthesis bold-italic epsilon Subscript t Baseline circled-times bold z Subscript t Baseline right-parenthesis equals 0

This condition states that the expected crossproducts of the unobservable disturbances, , and functions of the observable variables are set to 0. The first moment of the crossproducts is

StartLayout 1st Row 1st Column bold m Subscript n 2nd Column equals 3rd Column StartFraction 1 Over n EndFraction sigma-summation Underscript t equals 1 Overscript n Endscripts bold m left-parenthesis bold y Subscript t Baseline comma bold x Subscript t Baseline comma bold-italic theta right-parenthesis 2nd Row 1st Column bold m left-parenthesis bold y Subscript t Baseline comma bold x Subscript t Baseline comma bold-italic theta right-parenthesis 2nd Column equals 3rd Column bold q left-parenthesis bold y Subscript t Baseline comma bold x Subscript t Baseline comma bold-italic theta right-parenthesis circled-times bold z Subscript t EndLayout

where .

The case where is considered here, where p is the number of parameters.

Estimate the true parameter vector by the value of that minimizes

upper S left-parenthesis theta comma bold upper V right-parenthesis equals left-bracket n bold m Subscript n Baseline left-parenthesis theta right-parenthesis right-bracket prime bold upper V Superscript negative 1 Baseline left-bracket n bold m Subscript n Baseline left-parenthesis theta right-parenthesis right-bracket slash n

where

StartLayout 1st Row bold upper V equals Cov left-parenthesis left-bracket n bold m Subscript n Baseline left-parenthesis theta Superscript 0 Baseline right-parenthesis right-bracket comma left-bracket n bold m Subscript n Baseline left-parenthesis theta Superscript 0 Baseline right-parenthesis right-bracket prime right-parenthesis EndLayout

The parameter vector that minimizes this objective function is the GMM estimator. GMM estimation is requested in the FIT statement with the GMM option.

The variance of the moment functions, , can be expressed as

where is estimated as

ModifyingAbove bold upper S With caret Subscript n Baseline equals StartFraction 1 Over n EndFraction sigma-summation Underscript t equals 1 Overscript n Endscripts sigma-summation Underscript s equals 1 Overscript n Endscripts left-parenthesis bold q left-parenthesis bold y Subscript t Baseline comma bold x Subscript t Baseline comma bold-italic theta right-parenthesis circled-times bold z Subscript t Baseline right-parenthesis left-parenthesis bold q left-parenthesis bold y Subscript s Baseline comma bold x Subscript s Baseline comma bold-italic theta right-parenthesis circled-times bold z Subscript s Baseline right-parenthesis prime

Note that is a matrix. Because Var does not decrease with increasing n, you consider estimators of of the form

StartLayout 1st Row 1st Column ModifyingAbove bold upper S With caret Subscript n Baseline left-parenthesis l left-parenthesis n right-parenthesis right-parenthesis 2nd Column equals 3rd Column sigma-summation Underscript tau equals negative n plus 1 Overscript n minus 1 Endscripts ModifyingAbove w With caret left-parenthesis StartFraction tau Over l left-parenthesis n right-parenthesis EndFraction right-parenthesis bold upper D ModifyingAbove bold upper S With caret Subscript n comma tau bold upper D 2nd Row 1st Column ModifyingAbove bold upper S With caret Subscript n comma tau 2nd Column equals 3rd Column StartLayout Enlarged left-brace 1st Row 1st Column sigma-summation Underscript t equals 1 plus tau Overscript n Endscripts left-bracket bold q left-parenthesis bold y Subscript t Baseline comma bold x Subscript t Baseline comma bold-italic theta Superscript number-sign Baseline right-parenthesis circled-times bold z Subscript t Baseline right-bracket left-bracket bold q left-parenthesis bold y Subscript t minus tau Baseline comma bold x Subscript t minus tau Baseline comma bold-italic theta Superscript number-sign Baseline right-parenthesis circled-times bold z Subscript t minus tau Baseline right-bracket prime 2nd Column tau greater-than-or-equal-to 0 2nd Row 1st Column left-parenthesis ModifyingAbove bold upper S With caret Subscript n comma negative tau Baseline right-parenthesis prime 2nd Column tau less-than 0 EndLayout 3rd Row 1st Column ModifyingAbove w With caret left-parenthesis StartFraction tau Over l left-parenthesis n right-parenthesis EndFraction right-parenthesis 2nd Column equals 3rd Column StartLayout Enlarged left-brace 1st Row 1st Column w left-parenthesis StartFraction tau Over l left-parenthesis n right-parenthesis EndFraction right-parenthesis 2nd Column l left-parenthesis n right-parenthesis greater-than 0 2nd Row 1st Column delta Subscript tau comma 0 2nd Column l left-parenthesis n right-parenthesis equals 0 EndLayout EndLayout

where is a scalar function that computes the bandwidth parameter, is a scalar valued kernel, and the Kronecker delta function, , is 1 if and 0 otherwise. The diagonal matrix is used for a small sample degrees of freedom correction (Gallant 1987). The initial used for the estimation of is obtained from a 2SLS estimation of the system. The degrees of freedom correction is handled by the VARDEF= option as it is for the S matrix estimation.

The following kernels are supported by PROC MODEL. They are listed with their default bandwidth functions.

Bartlett: KERNEL=BART

StartLayout 1st Row 1st Column w left-parenthesis x right-parenthesis 2nd Column equals 3rd Column StartLayout Enlarged left-brace 1st Row 1st Column 1 minus StartAbsoluteValue x EndAbsoluteValue 2nd Column StartAbsoluteValue x EndAbsoluteValue less-than-or-equal-to 1 2nd Row 1st Column 0 2nd Column otherwise EndLayout 2nd Row 1st Column l left-parenthesis n right-parenthesis 2nd Column equals 3rd Column one-half n Superscript 1 slash 3 EndLayout

Parzen: KERNEL=PARZEN

Quadratic spectral: KERNEL=QS

StartLayout 1st Row 1st Column w left-parenthesis x right-parenthesis 2nd Column equals 3rd Column StartFraction 25 Over 12 pi squared x squared EndFraction left-parenthesis StartFraction sine left-parenthesis 6 pi x slash 5 right-parenthesis Over 6 pi x slash 5 EndFraction minus cosine left-parenthesis 6 pi x slash 5 right-parenthesis right-parenthesis 2nd Row 1st Column l left-parenthesis n right-parenthesis 2nd Column equals 3rd Column one-half n Superscript 1 slash 5 EndLayout

Figure 23: Kernels for Smoothing

For more information about the properties of these and other kernels, see Andrews (1991). Kernels are selected with the KERNEL= option; KERNEL=PARZEN is the default. The general form of the KERNEL= option is

KERNEL=( PARZEN | QS | BART, c, e )

where the and are used to compute the bandwidth parameter as

l left-parenthesis n right-parenthesis equals c n Superscript e

The bias of the standard error estimates increases for large bandwidth parameters. A warning message is produced for bandwidth parameters greater than . For a discussion of the computation of the optimal , see Andrews (1991).

The "Newey-West" kernel (Newey and West 1987) corresponds to the Bartlett kernel with bandwidth parameter . That is, if the "lag length" for the Newey-West kernel is , then the corresponding MODEL procedure syntax is KERNEL=(bart, L+1, 0).

Andrews and Monahan (1992) show that using prewhitening in combination with GMM can improve confidence interval coverage and reduce over rejection of t statistics at the cost of inflating the variance and MSE of the estimator. Prewhitening can be performed by using the %AR macros.

For the special case that the errors are not serially correlated—that is,

upper E left-parenthesis e Subscript t Baseline circled-times bold z Subscript t Baseline right-parenthesis left-parenthesis e Subscript s Baseline circled-times bold z Subscript s Baseline right-parenthesis equals 0 t not-equals s

the estimate for reduces to

The option KERNEL=(kernel,0,) is used to select this type of estimation when using GMM.

Covariance of GMM estimators

The covariance of GMM estimators, given a general weighting matrix , is

left-bracket left-parenthesis bold upper Y bold upper X right-parenthesis prime bold upper V Subscript normal upper G Superscript negative 1 Baseline left-parenthesis bold upper Y bold upper X right-parenthesis right-bracket Superscript negative 1 Baseline left-parenthesis bold upper Y bold upper X right-parenthesis prime bold upper V Subscript normal upper G Superscript negative 1 Baseline ModifyingAbove bold upper V With caret bold upper V Subscript normal upper G Superscript negative 1 Baseline left-parenthesis bold upper Y bold upper X right-parenthesis left-bracket left-parenthesis bold upper Y bold upper X right-parenthesis prime bold upper V Subscript normal upper G Superscript negative 1 Baseline left-parenthesis bold upper Y bold upper X right-parenthesis right-bracket Superscript negative 1

By default or when GENGMMV is specified, this is the covariance of GMM estimators.

If the weighting matrix is the same as , then the covariance of GMM estimators becomes

left-bracket left-parenthesis bold upper Y bold upper X right-parenthesis prime ModifyingAbove bold upper V With caret Superscript negative 1 Baseline left-parenthesis bold upper Y bold upper X right-parenthesis right-bracket Superscript negative 1

If NOGENGMMV is specified, this is used as the covariance estimators.

Testing Overidentifying Restrictions

Let r be the number of unique instruments times the number of equations. The value r represents the number of orthogonality conditions imposed by the GMM method. Under the assumptions of the GMM method, linearly independent combinations of the orthogonality should be close to zero. The GMM estimates are computed by setting these combinations to zero. When r exceeds the number of parameters to be estimated, the OBJECTIVE*N, reported at the end of the estimation, is an asymptotically valid statistic to test the null hypothesis that the overidentifying restrictions of the model are valid. The OBJECTIVE*N is distributed as a chi-square with degrees of freedom (Hansen 1982, p. 1049). When the GMM method is selected, the value of the overidentifying restrictions test statistic, also known as Hansen’s J test statistic, and its associated number of degrees of freedom are reported together with the probability under the null hypothesis.

Iterated Generalized Method of Moments (ITGMM)

Iterated generalized method of moments is similar to the iterated versions of 2SLS, SUR, and 3SLS. The variance matrix for GMM estimation is reestimated at each iteration with the parameters determined by the GMM estimation. The iteration terminates when the variance matrix for the equation errors change less than the CONVERGE= value. Iterated generalized method of moments is selected by the ITGMM option in the FIT statement. For some indication of the small sample properties of ITGMM, see Ferson and Foerster (1993).

Simulated Method of Moments (SMM)

The SMM method uses simulation techniques in model inference and estimation. It is appropriate for estimating models in which integrals appear in the objective function, and these integrals can be approximated by simulation. There might be various reasons for integrals to appear in an objective function (for example, transformation of a latent model into an observable model, missing data, random coefficients, heterogeneity, and so on).

This simulation method can be used with all the estimation methods except full information maximum likelihood (FIML) in PROC MODEL. SMM, also known as simulated generalized method of moments (SGMM), is the default estimation method because of its nice properties.

Estimation Details

A general nonlinear model can be described as

bold-italic epsilon Subscript t Baseline equals bold q left-parenthesis bold y Subscript t Baseline comma bold x Subscript t Baseline comma bold-italic theta right-parenthesis

where q is a real vector valued function of y, x, ; g is the number of equations; l is the number of exogenous variables (lagged endogenous variables are considered exogenous here); p is the number of parameters; and t ranges from 1 to n. is an unobservable disturbance vector with the following properties:

In many cases, it is not possible to write in a closed form. Instead is expressed as an integral of a function ; that is,

bold q left-parenthesis bold y Subscript t Baseline comma bold x Subscript t Baseline comma bold-italic theta right-parenthesis equals integral bold f left-parenthesis bold y Subscript t Baseline comma bold x Subscript t Baseline comma bold-italic theta comma bold u Subscript t Baseline right-parenthesis d upper P left-parenthesis bold u right-parenthesis

where f is a real vector valued function of y, x, , and u, m is the number of stochastic variables with a known distribution . Since the distribution of u is completely known, it is possible to simulate artificial draws from this distribution. Using such independent draws , , and the strong law of large numbers, can be approximated by

StartFraction 1 Over upper H EndFraction sigma-summation Underscript h equals 1 Overscript upper H Endscripts bold f left-parenthesis bold y Subscript t Baseline comma bold x Subscript t Baseline comma bold-italic theta comma bold u Subscript h t Baseline right-parenthesis period

Simulated Generalized Method of Moments (SGMM)

Generalized method of moments (GMM) is widely used to obtain efficient estimates for general model systems. When the moment conditions are not readily available in closed forms but can be approximated by simulation, simulated generalized method of moments (SGMM) can be used. The SGMM estimators have the nice property of being asymptotically consistent and normally distributed even if the number of draws H is fixed (see McFadden 1989; Pakes and Pollard 1989).

Consider the nonlinear model

where is a vector of k instruments and is an unobservable disturbance vector that can be serially correlated and nonstationary. In the case of no instrumental variables, is 1. is the vector of moment conditions, and it is approximated by simulation.

In general, theory suggests the following orthogonality condition,

which states that the expected crossproducts of the unobservable disturbances, , and functions of the observable variables are set to 0. The sample means of the crossproducts are

where . The case where , where p is the number of parameters, is considered here. An estimate of the true parameter vector is the value of that minimizes

upper S left-parenthesis theta comma upper V right-parenthesis equals left-bracket n bold m Subscript n Baseline left-parenthesis theta right-parenthesis right-bracket prime bold upper V Superscript negative 1 Baseline left-bracket n bold m Subscript n Baseline left-parenthesis theta right-parenthesis right-bracket slash n

where

bold upper V equals normal upper C normal o normal v left-parenthesis bold m left-parenthesis theta Superscript 0 Baseline right-parenthesis comma bold m left-parenthesis theta Superscript 0 Baseline right-parenthesis Superscript prime Baseline right-parenthesis period

The steps for SGMM are as follows:

1. Start with a positive definite matrix. This matrix can be estimated from a consistent estimator of . If is a consistent estimator, then for can be simulated number of times. A consistent estimator of is obtained as

ModifyingAbove bold upper V With caret equals StartFraction 1 Over n EndFraction sigma-summation Underscript t equals 1 Overscript n Endscripts left-bracket StartFraction 1 Over upper H prime EndFraction sigma-summation Underscript h equals 1 Overscript upper H prime Endscripts bold f left-parenthesis bold y Subscript t Baseline comma bold x Subscript t Baseline comma ModifyingAbove bold-italic theta With caret comma bold u Subscript h t Baseline right-parenthesis circled-times bold z Subscript t Baseline right-bracket left-bracket StartFraction 1 Over upper H prime EndFraction sigma-summation Underscript h equals 1 Overscript upper H prime Endscripts bold f left-parenthesis bold y Subscript t Baseline comma bold x Subscript t Baseline comma ModifyingAbove bold-italic theta With caret comma bold u Subscript h t Baseline right-parenthesis circled-times bold z Subscript t Baseline right-bracket prime

must be large so that this is an consistent estimator of .

2. Simulate H number of for . As shown by Gourieroux and Monfort (1993), the number of simulations H does not need to be very large. For , the SGMM estimator achieves 90% of the efficiency of the corresponding GMM estimator. Find that minimizes the quadratic product of the moment conditions again with the weight matrix being .

min Underscript theta Endscripts left-bracket n bold m Subscript n Baseline left-parenthesis theta right-parenthesis right-bracket prime ModifyingAbove bold upper V With caret Superscript negative 1 Baseline left-bracket n bold m Subscript n Baseline left-parenthesis theta right-parenthesis right-bracket slash n

3. The covariance matrix of is given as (Gourieroux and Monfort 1993)

bold upper Sigma 1 Superscript negative 1 Baseline bold upper D ModifyingAbove bold upper V With caret Superscript negative 1 Baseline bold upper V left-parenthesis ModifyingAbove theta With caret right-parenthesis ModifyingAbove bold upper V With caret Superscript negative 1 Baseline bold upper D prime bold upper Sigma 1 Superscript negative 1 plus StartFraction 1 Over upper H EndFraction bold upper Sigma 1 Superscript negative 1 Baseline bold upper D ModifyingAbove bold upper V With caret Superscript negative 1 Baseline upper E left-bracket bold z circled-times upper V a r left-parenthesis bold f vertical-bar bold x right-parenthesis circled-times bold z right-bracket ModifyingAbove bold upper V With caret Superscript negative 1 Baseline bold upper D prime bold upper Sigma 1 Superscript negative 1

where , is the matrix of partial derivatives of the residuals with respect to the parameters, is the covariance of moments from estimated parameters , and is the covariance of moments for each observation from simulation. The first term is the variance-covariance matrix of the exact GMM estimator, and the second term accounts for the variation contributed by simulating the moments.

Implementation in PROC MODEL

In PROC MODEL, if the user specifies the GMM and NDRAW options in the FIT statement, PROC MODEL first fits the model by using N2SLS and computes by using the estimates from N2SLS and simulation. If NO2SLS is specified in the FIT statement, is read from the VDATA= data set. If the user does not provide a matrix, the initial starting value of is used as the estimator for computing the matrix in step 1. If ITGMM option is specified instead of GMM, then PROC MODEL iterates from step 1 to step 3 until the matrix converges.

The consistency of the parameter estimates is not affected by the variance correction shown in the second term in step 3. The correction on the variance of parameter estimates is not computed by default. To add the adjustment, use the ADJSMMV option in the FIT statement. This correction is of the order of and is small even for moderate H.

The following example illustrates how to use SMM to estimate a simple regression model. Suppose the model is

y equals a plus b x plus u comma u tilde i i d upper N left-parenthesis 0 comma s squared right-parenthesis period

First, consider the problem in a GMM context. The first two moments of y are easily derived:

StartLayout 1st Row 1st Column upper E left-parenthesis y right-parenthesis 2nd Column equals 3rd Column a plus b x 2nd Row 1st Column upper E left-parenthesis y squared right-parenthesis 2nd Column equals 3rd Column left-parenthesis a plus b x right-parenthesis squared plus s squared EndLayout

Rewrite the moment conditions in the form similar to the preceding discussion:

StartLayout 1st Row 1st Column epsilon Subscript 1 t 2nd Column equals 3rd Column y Subscript t Baseline minus left-parenthesis a plus b x Subscript t Baseline right-parenthesis 2nd Row 1st Column epsilon Subscript 2 t 2nd Column equals 3rd Column y Subscript t Superscript 2 Baseline minus left-parenthesis a plus b x Subscript t Baseline right-parenthesis squared minus s squared EndLayout

Then you can estimate this model by using GMM with the following statements:

proc model data=a;
   parms a b s;
   instrument x;
   eq.m1 = y-(a+b*x);
   eq.m2 = y*y - (a+b*x)**2 - s*s;
   bound s > 0;
   fit m1 m2 / gmm;
run;

Now suppose you do not have the closed form for the moment conditions. Instead you can simulate the moment conditions by generating H number of simulated samples based on the parameters. Then the simulated moment conditions are

StartLayout 1st Row 1st Column epsilon Subscript 1 t 2nd Column equals 3rd Column StartFraction 1 Over upper H EndFraction sigma-summation Underscript h equals 1 Overscript upper H Endscripts StartSet y Subscript t Baseline minus left-parenthesis a plus b x Subscript t Baseline plus s u Subscript t comma h Baseline right-parenthesis EndSet 2nd Row 1st Column epsilon Subscript 2 t 2nd Column equals 3rd Column StartFraction 1 Over upper H EndFraction sigma-summation Underscript h equals 1 Overscript upper H Endscripts StartSet y Subscript t Superscript 2 Baseline minus left-parenthesis a plus b x Subscript t Baseline plus s u Subscript t comma h Baseline right-parenthesis squared EndSet EndLayout

This model can be estimated by using SGMM with the following statements:

proc model data=_tmpdata;
   parms a b s;
   instrument x;
   ysim = (a+b*x) + s * rannor( 98711 );
   eq.m1 = y-ysim;
   eq.m2 = y*y - ysim*ysim;
   bound s > 0;
   fit m1 m2 / gmm ndraw=10;
run;

You can use the following MOMENT statement instead of specifying the two moment equations shown earlier:

moment ysim=(1, 2);

In cases where you require a large number of moment equations, using the MOMENT statement to specify them is more efficient.

Note that the NDRAW= option tells PROC MODEL that this is a simulation-based estimation. Thus, the random number function RANNOR returns random numbers in estimation process. During the simulation, 10 draws of and are generated for each observation, and the averages enter the objective functions just as the equations specified previously.

Other Estimation Methods

The simulation method can be used not only with GMM and ITGMM, but also with OLS, ITOLS, SUR, ITSUR, N2SLS, IT2SLS, N3SLS, and IT3SLS. These simulation-based methods are similar to the corresponding methods in PROC MODEL; the only difference is that the objective functions include the average of the H simulations.

Full Information Maximum Likelihood Estimation (FIML)

A different approach to the simultaneous equation bias problem is the full information maximum likelihood (FIML) estimation method (Amemiya 1977).

Compared to the instrumental variables methods (2SLS and 3SLS), the FIML method has these advantages and disadvantages:

FIML does not require instrumental variables.
FIML requires that the model include the full equation system, with as many equations as there are endogenous variables. With 2SLS or 3SLS, you can estimate some of the equations without specifying the complete system.
FIML assumes that the equations errors have a multivariate normal distribution. If the errors are not normally distributed, the FIML method might produce poor results. 2SLS and 3SLS do not assume a specific distribution for the errors.
The FIML method is computationally expensive.

The full information maximum likelihood estimators of and are the and that minimize the negative log-likelihood function:

The option FIML requests full information maximum likelihood estimation. If the errors are distributed normally, FIML produces efficient estimators of the parameters. If instrumental variables are not provided, the starting values for the estimation are obtained from a SUR estimation. If instrumental variables are provided, then the starting values are obtained from a 3SLS estimation. The log-likelihood value and the l norm of the gradient of the negative log-likelihood function are shown in the estimation summary.

FIML Details

To compute the minimum of , this function is concentrated using the relation

bold upper Sigma left-parenthesis theta right-parenthesis equals StartFraction 1 Over n EndFraction sigma-summation Underscript t equals 1 Overscript n Endscripts bold q left-parenthesis bold y Subscript t Baseline comma bold x Subscript t Baseline comma bold-italic theta right-parenthesis bold q prime left-parenthesis bold y Subscript t Baseline comma bold x Subscript t Baseline comma bold-italic theta right-parenthesis

This results in the concentrated negative log-likelihood function discussed in Davidson and MacKinnon (1993):

bold l Subscript n Baseline left-parenthesis bold-italic theta right-parenthesis equals StartFraction n g Over 2 EndFraction left-parenthesis 1 plus ln left-parenthesis 2 pi right-parenthesis right-parenthesis minus sigma-summation Underscript t equals 1 Overscript n Endscripts ln StartAbsoluteValue StartFraction partial-differential Over partial-differential bold y Subscript t Superscript prime Baseline EndFraction bold q left-parenthesis bold y Subscript t Baseline comma bold x Subscript t Baseline comma bold-italic theta right-parenthesis EndAbsoluteValue plus StartFraction n Over 2 EndFraction ln StartAbsoluteValue bold upper Sigma left-parenthesis theta right-parenthesis EndAbsoluteValue

The gradient of the negative log-likelihood function is

StartFraction partial-differential Over partial-differential theta Subscript i Baseline EndFraction bold l Subscript n Baseline left-parenthesis bold-italic theta right-parenthesis equals sigma-summation Underscript t equals 1 Overscript n Endscripts nabla Subscript i Baseline left-parenthesis t right-parenthesis

where

StartFraction partial-differential bold upper Sigma left-parenthesis theta right-parenthesis Over partial-differential theta Subscript i Baseline EndFraction equals StartFraction 2 Over n EndFraction sigma-summation Underscript t equals 1 Overscript n Endscripts bold q left-parenthesis bold y Subscript t Baseline comma bold x Subscript t Baseline comma bold-italic theta right-parenthesis StartFraction partial-differential bold q left-parenthesis bold y Subscript t Baseline comma bold x Subscript t Baseline comma bold-italic theta right-parenthesis prime Over partial-differential theta Subscript i Baseline EndFraction

The estimator of the variance-covariance of (COVB) for FIML can be selected with the COVBEST= option with the following arguments:

CROSS

selects the crossproducts estimator of the covariance matrix (Gallant 1987, p. 473),

StartLayout 1st Row upper C equals left-parenthesis StartFraction 1 Over n EndFraction sigma-summation Underscript t equals 1 Overscript n Endscripts nabla left-parenthesis t right-parenthesis nabla prime left-parenthesis t right-parenthesis right-parenthesis Superscript negative 1 EndLayout

where . This is the default.

GLS

selects the generalized least squares estimator of the covariance matrix. This is computed as (Dagenais 1978)

upper C equals left-bracket ModifyingAbove bold upper Z With caret prime left-parenthesis bold upper Sigma left-parenthesis theta right-parenthesis Superscript negative 1 Baseline circled-times upper I right-parenthesis ModifyingAbove bold upper Z With caret right-bracket Superscript negative 1

where is and each column vector is obtained from stacking the columns of

is an matrix of residuals and is an matrix .

FDA

selects the inverse of concentrated likelihood Hessian as an estimator of the covariance matrix. The Hessian is computed numerically, so for a large problem this is computationally expensive.

The HESSIAN= option controls which approximation to the Hessian is used in the minimization procedure. Alternate approximations are used to improve convergence and execution time. The choices are as follows:

CROSS: The crossproducts approximation is used.
GLS: The generalized least squares approximation is used (default).
FDA: The Hessian is computed numerically by finite differences.

HESSIAN=GLS has better convergence properties in general, but COVBEST=CROSS produces the most pessimistic standard error bounds. When the HESSIAN= option is used, the default estimator of the variance-covariance of is the inverse of the Hessian selected.

Multivariate t Distribution Estimation

The multivariate t distribution is specified by using the ERRORMODEL statement with the T option. Other method specifications (FIML and OLS, for example ) are ignored when the ERRORMODEL statement is used for a distribution other than normal.

The probability density function for the multivariate t distribution is

upper P Subscript q Baseline equals StartStartFraction normal upper Gamma left-parenthesis StartFraction d f plus m Over 2 EndFraction right-parenthesis OverOver left-parenthesis pi asterisk d f right-parenthesis Superscript StartFraction m Over 2 EndFraction Baseline asterisk normal upper Gamma left-parenthesis StartFraction d f Over 2 EndFraction right-parenthesis StartAbsoluteValue bold upper Sigma left-parenthesis sigma right-parenthesis EndAbsoluteValue Superscript one-half Baseline EndEndFraction asterisk left-parenthesis 1 plus StartFraction bold q prime left-parenthesis bold y Subscript t Baseline comma bold x Subscript t Baseline comma bold-italic theta right-parenthesis bold upper Sigma left-parenthesis sigma right-parenthesis Superscript negative 1 Baseline bold q left-parenthesis bold y Subscript t Baseline comma bold x Subscript t Baseline comma bold-italic theta right-parenthesis Over d f EndFraction right-parenthesis Superscript minus StartFraction d f plus m Over 2 EndFraction

where m is the number of equations and is the degrees of freedom.

The maximum likelihood estimators of and are the and that minimize the negative log-likelihood function:

The ERRORMODEL statement is used to request the t distribution maximum likelihood estimation. An OLS estimation is done to obtain initial parameter estimates and MSE.var estimates. Use NOOLS to turn off this initial estimation. If the errors are distributed normally, t distribution estimation produces results similar to FIML.

The multivariate model has a single shared degrees-of-freedom parameter, which is estimated. The degrees-of-freedom parameter can also be set to a fixed value. The log-likelihood value and the l norm of the gradient of the negative log-likelihood function are shown in the estimation summary.

t Distribution Details

Since a variance term is explicitly specified by using the ERRORMODEL statement, is estimated as a correlation matrix and is normalized by the variance. The gradient of the negative log-likelihood function with respect to the degrees of freedom is

The gradient of the negative log-likelihood function with respect to the parameters is

StartFraction partial-differential l Subscript n Baseline Over partial-differential theta Subscript i Baseline EndFraction equals StartFraction 0.5 left-parenthesis d f plus m right-parenthesis Over left-parenthesis 1 plus bold q prime bold upper Sigma Superscript negative 1 Baseline bold q slash d f right-parenthesis EndFraction left-bracket StartStartFraction left-parenthesis 2 bold q prime bold upper Sigma Superscript negative 1 Baseline StartFraction partial-differential bold q Over partial-differential theta Subscript i Baseline EndFraction right-parenthesis OverOver d f EndEndFraction plus bold q prime bold upper Sigma Superscript negative 1 Baseline StartFraction partial-differential bold upper Sigma Over partial-differential theta Subscript i Baseline EndFraction bold upper Sigma Superscript negative 1 Baseline bold q right-bracket minus StartFraction n Over 2 EndFraction normal t normal r normal a normal c normal e left-parenthesis bold upper Sigma Superscript negative 1 Baseline StartFraction partial-differential bold upper Sigma Over partial-differential theta Subscript i Baseline EndFraction right-parenthesis

where

and

bold q left-parenthesis bold y Subscript t Baseline comma bold x Subscript t Baseline comma bold-italic theta right-parenthesis equals StartFraction epsilon left-parenthesis theta right-parenthesis Over StartRoot h left-parenthesis theta right-parenthesis EndRoot EndFraction element-of upper R Superscript m times n

The estimator of the variance-covariance of (COVB) for the t distribution is the inverse of the likelihood Hessian. The gradient is computed analytically, and the Hessian is computed numerically.

Empirical Distribution Estimation and Simulation

(View the complete code for this example.)

The following SAS statements fit a model that uses least squares as the likelihood function, but represent the distribution of the residuals with an empirical cumulative distribution function (CDF). The plot of the empirical probability distribution is shown in Figure 24.

data t;  /* Sum of two normals  */
   format date monyy.;
   do t = 0 to 9.9 by 0.1;
      date = intnx( 'month', '1jun90'd, (t*10)-1 );
      y =  0.1 * (rannor(123)-10) +
            .5 * (rannor(123)+10);
      output;
   end;
run;

ods select Model.Liklhood.ResidSummary
           Model.Liklhood.ParameterEstimates;

proc model data=t time=t itprint;
   dependent y;
   parm a 5;

   y = a;
   obj = resid.y * resid.y;
   errormodel y ~ general( obj )
   cdf=(empirical=(tails=( normal percent=10)));

   fit y / outsn=s out=r;
   id  date;

   solve y / data=t(where=(date='1aug98'd))
             residdata=r sdata=s
             random=200 seed=6789 out=monte ;
run;



proc kde data=monte;
   univar y / plots=density;
run;

Figure 24: Empirical PDF Plot

For simulation, if the CDF for the model is not built in to the procedure, you can use the CDF=EMPIRICAL() option. This uses the sorted residual data to create an empirical CDF. For computing the inverse CDF, the program needs to know how to handle the tails. For continuous data, the tail distribution is generally poorly determined. To counter this, the PERCENT= option specifies the percentage of the observations to use in constructing each tail. The default for the PERCENT= option is 10.

A normal distribution or a t distribution is used to extrapolate the tails to infinity. The standard errors for this extrapolation are obtained from the data so that the empirical CDF is continuous.

Last updated: June 19, 2025