Regression Action Set

Generalized Estimating Equations

The analysis of correlated data that arise from repeated measurements when the measurements are assumed to be multivariate normal has been studied extensively. However, the normality assumption might not always be reasonable; for example, you must use a different methodology in the data analysis when the responses are discrete and correlated. Generalized estimating equations (GEEs) provide a practical method with reasonable statistical efficiency to analyze such data.

Liang and Zeger (1986) introduced GEEs as a method of dealing with correlated data when, except for the correlation among responses, the data can be modeled as a generalized linear model. For example, in many cases you can model correlated binary and count data in this way. For more information about GEEs, see Hardin and Hilbe (2003), Diggle, Liang, and Zeger (1994), and Lipsitz et al. (1994).

Let , , , represent the jth measurement on the ith subject. There are measurements on subject i and total measurements, where is the cluster frequency that you specify in the freq parameter. If you omit the freq parameter, then for all observations. The frequencies must be the same for all observations within each cluster.

You model correlated data by using the same link function and linear predictor specification that you use to model independent data. You use the same variance functions as in the independent case, but you must also specify a structure for the correlation between the repeated measurements. In the following material, denotes the variance function for the specified distribution.

Let the vector of measurements on the ith subject be , where the corresponding vector of means is , and let be the covariance matrix of . Let the vector of independent, or explanatory, variables for the jth measurement on the ith subject be . The generalized estimating equation of Liang and Zeger (1986) for estimating the vector of regression parameters is an extension of the independence estimating equation to correlated data and is given by

bold upper S left parenthesis bold italic beta right parenthesis equals sigma summation Underscript i equals 1 Overscript upper K Endscripts f Subscript i Baseline bold upper D prime Subscript i Baseline bold upper V Subscript i Superscript negative 1 Baseline left parenthesis bold upper Y Subscript i Baseline minus bold italic mu Subscript i Baseline left parenthesis bold italic beta right parenthesis right parenthesis equals bold 0

where

bold upper D Subscript i Baseline equals StartFraction partial differential bold italic mu Subscript i Baseline Over partial differential bold italic beta EndFraction

The relationship between the linear predictor and the mean is specified by

g left parenthesis mu Subscript i j Baseline right parenthesis equals bold x prime Subscript i j Baseline bold italic beta

where g is the link function, and so the matrix of partial derivatives of the mean with respect to the regression parameters for the ith subject is given by

Working Correlation Matrix

Let be an "working" correlation matrix that is fully specified by the vector of parameters . The covariance matrix of is modeled as

bold upper V Subscript i Baseline equals bold upper A Subscript i Superscript one half Baseline bold upper W Subscript i Superscript negative one half Baseline bold upper R left parenthesis bold italic alpha right parenthesis bold upper W Subscript i Superscript negative one half Baseline bold upper A Subscript i Superscript one half

where is an diagonal matrix whose jth diagonal element is , and is an diagonal matrix whose jth diagonal is , where is a weight specified in the weight parameter. If there is no weight parameter, then for all i and j. If is the true correlation matrix of , then is the true covariance matrix of .

The working correlation matrix is usually unknown and must be estimated. You estimate it in the iterative fitting process by using the current value of the parameter vector to compute appropriate functions of the Pearson residual

e Subscript i j Baseline equals StartFraction y Subscript i j Baseline minus mu Subscript i j Baseline Over StartRoot v left parenthesis mu Subscript i j Baseline right parenthesis divided by w Subscript i j Baseline EndRoot EndFraction

If you specify the working correlation as , which is the identity matrix, the GEE reduces to the independence estimating equation.

Table 6 shows the working correlation types that the logistic action supports and the estimators that are used to estimate the working correlations.

Table 6: Working Correlation Structures and Correlation Parameter Estimators

Working Correlation Structure		Estimator
Autoregressive AR(1)
	for

Exchangeable


Independence
		The working correlation is not estimated in this case.
m-dependent


Unstructured

Fitting Algorithm

The following algorithm fits the specified model by using GEEs. Note that this is not in general a likelihood-based method of estimation; inferences based on likelihoods are not possible for GEE methods.

Compute an initial estimate of by using an ordinary generalized linear model, assuming independence.
Compute the working correlations on the basis of the standardized residuals, the current , and the assumed structure of .
Compute an estimate of the covariance:
Update :
Repeat steps 2–4 until convergence.

Missing Data

See Diggle, Liang, and Zeger (1994, Chapter 11) for a discussion of missing values in longitudinal data. Suppose you intend to take measurements for the ith unit. Missing values for which are missing whenever is missing for all are called dropouts. Otherwise, missing values that are intermixed with nonmissing values are called intermittent missing values. The logistic action can estimate the working correlation from data that contain both types of missing values by using the all available pairs method, in which all nonmissing pairs of data are used in the moment estimators of the working correlation parameters that are defined previously. The resulting covariances and standard errors are valid under the missing completely at random (MCAR) assumption.

For example, for the unstructured working correlation model,

ModifyingAbove alpha With caret Subscript j k Baseline equals StartFraction 1 Over left parenthesis upper K prime minus p right parenthesis EndFraction sigma summation f Subscript i Baseline e Subscript i j Baseline e Subscript i k

where the sum is over the units that have nonmissing measurements at times j and k, and is the number of units that have nonmissing measurements at j and k. Estimates of the parameters for other working correlation types are computed in a similar manner, using available nonmissing pairs in the appropriate moment estimators.

The contribution of the ith unit to the parameter update equation is computed by omitting the elements of , the columns of , and the rows and columns of that correspond to missing measurements.

Parameter Estimate Covariances

The model-based estimator of is given by

bold upper Sigma Subscript m Baseline left parenthesis ModifyingAbove bold italic beta With caret right parenthesis equals bold upper I 0 Superscript negative 1

where

bold upper I 0 equals sigma summation Underscript i equals 1 Overscript upper K Endscripts f Subscript i Baseline StartFraction partial differential bold italic mu Subscript i Baseline Over partial differential bold italic beta EndFraction prime bold upper V Subscript i Superscript negative 1 Baseline StartFraction partial differential bold italic mu Subscript i Baseline Over partial differential bold italic beta EndFraction

This is a consistent estimator of the covariance matrix of if the mean model and the working correlation matrix are correctly specified.

The estimator

bold upper Sigma Subscript e Baseline equals bold upper I 0 Superscript negative 1 Baseline bold upper I 1 bold upper I 0 Superscript negative 1

is called the empirical, or robust, estimator of the covariance matrix of , where

bold upper I 1 equals sigma summation Underscript i equals 1 Overscript upper K Endscripts f Subscript i Baseline StartFraction partial differential bold italic mu Subscript i Baseline Over partial differential bold italic beta EndFraction prime bold upper V Subscript i Superscript negative 1 Baseline normal upper C normal o normal v left parenthesis bold upper Y Subscript i Baseline right parenthesis bold upper V Subscript i Superscript negative 1 Baseline StartFraction partial differential bold italic mu Subscript i Baseline Over partial differential bold italic beta EndFraction

It has the property of being a consistent estimator of the covariance matrix of , even if the working correlation matrix is misspecified—that is, if . For further information about the robust variance estimate, see Zeger, Liang, and Albert (1988), Royall (1986), and White (1982). In computing , the parameters and are replaced by estimates, and is replaced by the estimate

left parenthesis bold upper Y Subscript i Baseline minus bold italic mu Subscript i Baseline left parenthesis ModifyingAbove bold italic beta With caret right parenthesis right parenthesis left parenthesis bold upper Y Subscript i Baseline minus bold italic mu Subscript i Baseline left parenthesis ModifyingAbove bold italic beta With caret right parenthesis right parenthesis prime

Quasi-likelihood Information Criterion

The quasi-likelihood information criterion (QIC) was developed by Pan (2001) as a modification of the Akaike information criterion (AIC) to apply to models that are fit by GEEs.

Define the quasi-likelihood under the independence working correlation assumption, which is evaluated using the parameter estimates under the working correlation R as

upper Q left parenthesis ModifyingAbove bold italic beta With caret left parenthesis upper R right parenthesis right parenthesis equals sigma summation Underscript i equals 1 Overscript upper K Endscripts f Subscript i Baseline sigma summation Underscript j equals 1 Overscript n Subscript i Baseline Endscripts upper Q left parenthesis ModifyingAbove bold italic beta With caret left parenthesis upper R right parenthesis semicolon left parenthesis upper Y Subscript i j Baseline comma bold upper X Subscript i j Baseline right parenthesis right parenthesis

where the quasi-likelihood contribution of the jth observation in the ith cluster is defined in the section Quasi-likelihood Functions and are the parameter estimates obtained from GEEs that use the working correlation R.

QIC is defined as

upper Q normal upper I normal upper C left parenthesis upper R right parenthesis equals minus 2 upper Q left parenthesis ModifyingAbove bold italic beta With caret left parenthesis upper R right parenthesis right parenthesis plus 2 normal t normal r normal a normal c normal e left parenthesis ModifyingAbove normal upper Omega With caret Subscript upper I Baseline ModifyingAbove upper V With caret Subscript upper R Baseline right parenthesis

where is the robust covariance estimate and is the inverse of the model-based covariance estimate under the independence working correlation assumption, evaluated at , which are the parameter estimates obtained from GEEs that use the working correlation R.

The logistic action also computes an approximation to , which Pan (2001) defines as

upper Q normal upper I normal upper C Subscript u Baseline left parenthesis upper R right parenthesis equals minus 2 upper Q left parenthesis ModifyingAbove bold italic beta With caret left parenthesis upper R right parenthesis right parenthesis plus 2 p

where p is the number of regression parameters.

Pan (2001) notes that QIC is appropriate for selecting regression models and working correlations, whereas is appropriate only for selecting regression models.

Quasi-likelihood Functions

For detailed discussions of quasi-likelihood functions, see McCullagh and Nelder (1989) and Hardin and Hilbe (2003). The contribution of observation j in cluster i to the quasi-likelihood function that is evaluated at the regression parameters is given by , where

upper Q Subscript i j Baseline equals w Subscript i j Baseline left bracket r Subscript i j Baseline log left parenthesis p Subscript i j Baseline right parenthesis plus left parenthesis n Subscript i j Baseline minus r Subscript i j Baseline right parenthesis log left parenthesis 1 minus p Subscript i j Baseline right parenthesis right bracket

These are used in the computation of the quasi-likelihood information criteria (QIC) for the goodness of fit of models that are fit using GEEs. The are prior weights, if any, that are specified in the weight or freq parameter.

GEE analysis is not supported when you perform model selection or when you perform a partitioned analysis.

Within-Subject Effects

If you use multiple variables to define the within-subject effect, the specification of the variables within this effect will have an impact on the analysis if the working correlation type depends on the ordering of the repeated measures. The autoregressive, m-dependent, and unstructured working correlation types use the levelization of the within-subject effect to locate observations within the correlation matrix, and depend on the order that is produced by the levelization.

In particular, if you define the within-subject effect by using the interaction of two or more variables, then the hierarchy of levelization and the resulting ordering of the repeated measures will depend on the order in which you list the variables in the effects subparameter of the repeated parameter.

For example, assume that you are using the interaction of the variables w1 and w2 as the within-subject effect in a repeated measures analysis. Furthermore, assume that w1 takes the values 'A' and 'B' and w2 takes the values 'L' and 'H'. The following statements list these variables in the order w2, w1 in the effects subparameter of the repeated parameter:

  regression.logistic
      table={name='health'},
      class={{vars={'subject', 'w1', 'w2'}}},
      model={depVars={{name='outcome', options={event='1'}}},
             effects={{vars={'age', 'smoke'}}} },
      repeated={{depVars={{name='outcome', options={event='1'}}},
                 effects={{vars={'w2', 'w1'}, interaction='CROSS'}},
                 type='REPEATED',
                 subject={{vars={'subject'} }}, corrtype='MDEP', mdepm=2}};

In this case, the order of the variables in the effects subparameter of the repeated parameter results in a levelization in which the value of w1 changes faster than the value of w2. Table 7 shows this levelization.

Table 7: Levelization Order When Variable Order Is w2, w1

Value of `w1`	Value of `w2`	Levelization Order
A	L	0
B	L	1
A	H	2
B	H	3

If you reverse the order of the variables in the effects subparameter of the repeated parameter, the levelization has a different hierarchy. In this hierarchy, the value of w2 changes faster than the value of w1. Table 8 shows that the levelization order that is assigned to each unique combination of the variables does not match the levelization order shown in Table 7.

Table 8: Levelization Order When Variable Order Is w1, w2

Value of `w1`	Value of `w2`	Levelization Order
A	L	0
B	L	2
A	H	1
B	H	3

Because the specified working correlation structure (m-dependent) relies on this ordering, the parameter estimates differ from those that are produced by the ordering shown in Table 7. The autoregressive and unstructured working correlation structures also result in different parameter estimates for the different orderings.

If you use two variables to define the within-subject effect, you can use a nested effect to enforce a specific levelization hierarchy. However, the best practice is to use a single variable in the within-subject effect.

Using a Preexisting Table Order

You can add group and order information to a table by using the partition action in the table action set. If you want to use the group and order information in a table to achieve a reproducible processing order and you also want to do a repeated measures analysis, the group information must follow specific rules. In particular, the variables that you specify in the groupBy parameter of the partition action must match the variables that you specify in the vars sub-subparameter of the subject subparameter of the repeated parameter.

There are some variations to this requirement, depending on which type of subject effect you specify and which other options you use. If you specify a single-variable subject effect or an interaction subject effect, you must use all these variables in the groupBy parameter of the partition action. If you specify a nested subject effect, you must include the nested variable in the groupBy parameter. You can also include the nesting variables in the groupBy parameter, but you cannot include any other variables in the groupBy parameter. However, if you specify the multipass parameter of the logistic action, you must include the nested variable and all nesting variables in the groupBy parameter. You cannot include any other variables in the groupBy parameter.

If you use an order-sensitive working correlation structure, you do not need to use the variables that form the within-subject effect in the orderBy parameter of the partition action. These sets of variables can overlap, but it is not required that the preexisting table order correspond to the ordering for the within-subject effect.

Last updated: March 05, 2026