Regression Action Set

Generalized Estimating Equations

The analysis of correlated data that arise from repeated measurements when the measurements are assumed to be multivariate normal has been studied extensively. However, the normality assumption might not always be reasonable; for example, you must use a different methodology in the data analysis when the responses are discrete and correlated. Generalized estimating equations (GEEs) provide a practical method with reasonable statistical efficiency to analyze such data.

Liang and Zeger (1986) introduced GEEs as a method of dealing with correlated data when, except for the correlation among responses, the data can be modeled as a generalized linear model. For example, in many cases you can model correlated binary and count data in this way. For more information about GEEs, see Hardin and Hilbe (2003), Diggle, Liang, and Zeger (1994), and Lipsitz et al. (1994).

Let y Subscript i j, j equals 1 comma ellipsis comma n Subscript i Baseline, i equals 1 comma ellipsis comma upper K, represent the jth measurement on the ith subject. There are n Subscript i measurements on subject i and sigma summation Underscript i equals 1 Overscript upper K Endscripts f Subscript i Baseline n Subscript i total measurements, where f Subscript i is the cluster frequency that you specify in the freq parameter. If you omit the freq parameter, then f Subscript i Baseline equals 1 for all observations. The frequencies must be the same for all observations within each cluster.

You model correlated data by using the same link function and linear predictor specification that you use to model independent data. You use the same variance functions as in the independent case, but you must also specify a structure for the correlation between the repeated measurements. In the following material, v left parenthesis mu right parenthesis denotes the variance function for the specified distribution.

Let the vector of measurements on the ith subject be bold upper Y Subscript i Baseline equals left bracket y Subscript i Baseline 1 Baseline comma ellipsis comma y Subscript i n Sub Subscript i Subscript Baseline right bracket prime, where the corresponding vector of means is bold italic mu Subscript i Baseline equals left bracket mu Subscript i Baseline 1 Baseline comma ellipsis comma mu Subscript i n Sub Subscript i Subscript Baseline right bracket prime, and let bold upper V Subscript i be the covariance matrix of bold upper Y Subscript i. Let the vector of independent, or explanatory, variables for the jth measurement on the ith subject be bold x Subscript i j Baseline equals left bracket x Subscript i j Baseline 1 Baseline comma ellipsis comma x Subscript i j p Baseline right bracket prime. The generalized estimating equation of Liang and Zeger (1986) for estimating the p times 1 vector of regression parameters bold-italic beta is an extension of the independence estimating equation to correlated data and is given by

bold upper S left parenthesis bold italic beta right parenthesis equals sigma summation Underscript i equals 1 Overscript upper K Endscripts f Subscript i Baseline bold upper D prime Subscript i Baseline bold upper V Subscript i Superscript negative 1 Baseline left parenthesis bold upper Y Subscript i Baseline minus bold italic mu Subscript i Baseline left parenthesis bold italic beta right parenthesis right parenthesis equals bold 0

where

bold upper D Subscript i Baseline equals StartFraction partial differential bold italic mu Subscript i Baseline Over partial differential bold italic beta EndFraction

The relationship between the linear predictor and the mean is specified by

g left parenthesis mu Subscript i j Baseline right parenthesis equals bold x prime Subscript i j Baseline bold italic beta

where g is the link function, and so the p times n Subscript i matrix of partial derivatives of the mean with respect to the regression parameters for the ith subject is given by

bold upper D prime Subscript i Baseline equals StartFraction partial differential bold italic mu prime Subscript i Over partial differential bold italic beta EndFraction equals Start 3 By 3 Matrix 1st Row 1st Column StartFraction x Subscript i Baseline 11 Baseline Over g prime left parenthesis mu Subscript i Baseline 1 Baseline right parenthesis EndFraction 2nd Column ellipsis 3rd Column StartFraction x Subscript i n Sub Subscript i Subscript Baseline 1 Baseline Over g prime left parenthesis mu Subscript i n Sub Subscript i Subscript Baseline right parenthesis EndFraction 2nd Row 1st Column vertical ellipsis 2nd Column Blank 3rd Column vertical ellipsis 3rd Row 1st Column StartFraction x Subscript i Baseline 1 p Baseline Over g prime left parenthesis mu Subscript i Baseline 1 Baseline right parenthesis EndFraction 2nd Column ellipsis 3rd Column StartFraction x Subscript i n Sub Subscript i Subscript p Baseline Over g prime left parenthesis mu Subscript i n Sub Subscript i Subscript Baseline right parenthesis EndFraction EndMatrix
Working Correlation Matrix

Let bold upper R Subscript i Baseline left parenthesis bold italic alpha right parenthesis be an n Subscript i Baseline times n Subscript i "working" correlation matrix that is fully specified by the vector of parameters bold italic alpha. The covariance matrix of bold upper Y Subscript i is modeled as

bold upper V Subscript i Baseline equals bold upper A Subscript i Superscript one half Baseline bold upper W Subscript i Superscript negative one half Baseline bold upper R left parenthesis bold italic alpha right parenthesis bold upper W Subscript i Superscript negative one half Baseline bold upper A Subscript i Superscript one half

where bold upper A Subscript i is an n Subscript i Baseline times n Subscript i diagonal matrix whose jth diagonal element is v left parenthesis mu Subscript i j Baseline right parenthesis, and bold upper W Subscript i is an n Subscript i Baseline times n Subscript i diagonal matrix whose jth diagonal is w Subscript i j, where w Subscript i j is a weight specified in the weight parameter. If there is no weight parameter, then w Subscript i j Baseline equals 1 for all i and j. If bold upper R Subscript i Baseline left parenthesis bold italic alpha right parenthesis is the true correlation matrix of bold upper Y Subscript i, then bold upper V Subscript i is the true covariance matrix of bold upper Y Subscript i.

The working correlation matrix is usually unknown and must be estimated. You estimate it in the iterative fitting process by using the current value of the parameter vector bold-italic beta to compute appropriate functions of the Pearson residual

e Subscript i j Baseline equals StartFraction y Subscript i j Baseline minus mu Subscript i j Baseline Over StartRoot v left parenthesis mu Subscript i j Baseline right parenthesis divided by w Subscript i j Baseline EndRoot EndFraction

If you specify the working correlation as bold upper R 0 equals bold upper I, which is the identity matrix, the GEE reduces to the independence estimating equation.

Table 6 shows the working correlation types that the logistic action supports and the estimators that are used to estimate the working correlations.

Table 6: Working Correlation Structures and Correlation Parameter Estimators

Working Correlation Structure Estimator
Autoregressive AR(1)
normal upper C normal o normal r normal r left parenthesis upper Y Subscript i j Baseline comma upper Y Subscript i comma j plus t Baseline right parenthesis equals alpha Superscript t for t equals 0 comma 1 comma 2 comma ellipsis comma n Subscript i Baseline minus j ModifyingAbove alpha With caret equals StartFraction 1 Over left parenthesis upper K 1 minus p right parenthesis EndFraction sigma summation Underscript i equals 1 Overscript upper K Endscripts f Subscript i Baseline sigma summation Underscript j less than or equals n Subscript i Baseline minus 1 Endscripts e Subscript i j Baseline e Subscript i comma j plus 1
upper K 1 equals sigma summation Underscript i equals 1 Overscript upper K Endscripts f Subscript i Baseline left parenthesis n Subscript i Baseline minus 1 right parenthesis
Exchangeable
normal upper C normal o normal r normal r left parenthesis upper Y Subscript i j Baseline comma upper Y Subscript i k Baseline right parenthesis equals StartLayout Enlarged left brace 1st Row 1st Column 1 2nd Column j equals k 2nd Row 1st Column alpha 2nd Column j not equals k EndLayout ModifyingAbove alpha With caret equals StartFraction 1 Over left parenthesis upper N Superscript asterisk Baseline minus p right parenthesis EndFraction sigma summation Underscript i equals 1 Overscript upper K Endscripts f Subscript i Baseline sigma summation Underscript j less than k Endscripts e Subscript i j Baseline e Subscript i k
upper N Superscript asterisk Baseline equals 0.5 sigma summation Underscript i equals 1 Overscript upper K Endscripts f Subscript i Baseline n Subscript i Baseline left parenthesis n Subscript i Baseline minus 1 right parenthesis
Independence
normal upper C normal o normal r normal r left parenthesis upper Y Subscript i j Baseline comma upper Y Subscript i k Baseline right parenthesis equals StartLayout Enlarged left brace 1st Row 1st Column 1 2nd Column j equals k 2nd Row 1st Column 0 2nd Column j not equals k EndLayout The working correlation is not estimated in this case.
m-dependent
normal upper C normal o normal r normal r left parenthesis upper Y Subscript i j Baseline comma upper Y Subscript i comma j plus t Baseline right parenthesis equals StartLayout Enlarged left brace 1st Row 1st Column 1 2nd Column t equals 0 2nd Row 1st Column alpha Subscript t Baseline 2nd Column t equals 1 comma 2 comma ellipsis comma m 3rd Row 1st Column 0 2nd Column t greater than m EndLayout ModifyingAbove alpha With caret Subscript t Baseline equals StartFraction 1 Over left parenthesis upper K Subscript t Baseline minus p right parenthesis EndFraction sigma summation Underscript i equals 1 Overscript upper K Endscripts f Subscript i Baseline sigma summation Underscript j less than or equals n Subscript i Baseline minus t Endscripts e Subscript i j Baseline e Subscript i comma j plus t
upper K Subscript t Baseline equals sigma summation Underscript i equals 1 Overscript upper K Endscripts f Subscript i Baseline left parenthesis n Subscript i Baseline minus t right parenthesis
Unstructured
normal upper C normal o normal r normal r left parenthesis upper Y Subscript i j Baseline comma upper Y Subscript i k Baseline right parenthesis equals StartLayout Enlarged left brace 1st Row 1st Column 1 2nd Column j equals k 2nd Row 1st Column alpha Subscript j k Baseline 2nd Column j not equals k EndLayout ModifyingAbove alpha With caret Subscript j k Baseline equals StartFraction 1 Over left parenthesis upper K minus p right parenthesis EndFraction sigma summation Underscript i equals 1 Overscript upper K Endscripts f Subscript i Baseline e Subscript i j Baseline e Subscript i k


Fitting Algorithm

The following algorithm fits the specified model by using GEEs. Note that this is not in general a likelihood-based method of estimation; inferences based on likelihoods are not possible for GEE methods.

  1. Compute an initial estimate of bold-italic beta by using an ordinary generalized linear model, assuming independence.

  2. Compute the working correlations bold upper R on the basis of the standardized residuals, the current bold-italic beta, and the assumed structure of bold upper R.

  3. Compute an estimate of the covariance:

    bold upper V Subscript i Baseline equals bold upper A Subscript i Superscript one half Baseline bold upper W Subscript i Superscript negative one half Baseline ModifyingAbove bold upper R With caret left parenthesis bold italic alpha right parenthesis bold upper W Subscript i Superscript negative one half Baseline bold upper A Subscript i Superscript one half
  4. Update bold-italic beta:

    bold italic beta Subscript r plus 1 Baseline equals bold italic beta Subscript r Baseline plus left bracket sigma summation Underscript i equals 1 Overscript upper K Endscripts f Subscript i Baseline StartFraction partial differential bold italic mu Subscript i Baseline Over partial differential bold italic beta EndFraction prime bold upper V Subscript i Superscript negative 1 Baseline StartFraction partial differential bold italic mu Subscript i Baseline Over partial differential bold italic beta EndFraction right bracket Superscript negative 1 Baseline left bracket sigma summation Underscript i equals 1 Overscript upper K Endscripts f Subscript i Baseline StartFraction partial differential bold italic mu Subscript i Baseline Over partial differential bold italic beta EndFraction prime bold upper V Subscript i Superscript negative 1 Baseline left parenthesis bold upper Y Subscript i Baseline minus bold italic mu Subscript i Baseline right parenthesis right bracket
  5. Repeat steps 2–4 until convergence.

Missing Data

See Diggle, Liang, and Zeger (1994, Chapter 11) for a discussion of missing values in longitudinal data. Suppose you intend to take measurements upper Y Subscript i Baseline 1 Baseline comma ellipsis comma upper Y Subscript i n Baseline for the ith unit. Missing values for which upper Y Subscript i j are missing whenever upper Y Subscript i k is missing for all j greater than or equals k are called dropouts. Otherwise, missing values that are intermixed with nonmissing values are called intermittent missing values. The logistic action can estimate the working correlation from data that contain both types of missing values by using the all available pairs method, in which all nonmissing pairs of data are used in the moment estimators of the working correlation parameters that are defined previously. The resulting covariances and standard errors are valid under the missing completely at random (MCAR) assumption.

For example, for the unstructured working correlation model,

ModifyingAbove alpha With caret Subscript j k Baseline equals StartFraction 1 Over left parenthesis upper K prime minus p right parenthesis EndFraction sigma summation f Subscript i Baseline e Subscript i j Baseline e Subscript i k

where the sum is over the units that have nonmissing measurements at times j and k, and upper K prime is the number of units that have nonmissing measurements at j and k. Estimates of the parameters for other working correlation types are computed in a similar manner, using available nonmissing pairs in the appropriate moment estimators.

The contribution of the ith unit to the parameter update equation is computed by omitting the elements of left parenthesis bold upper Y Subscript i Baseline minus bold italic mu Subscript bold i Baseline right parenthesis, the columns of bold upper D prime Subscript i Baseline equals StartFraction partial differential bold italic mu Over partial differential bold italic beta EndFraction prime, and the rows and columns of bold upper V Subscript i that correspond to missing measurements.

Parameter Estimate Covariances

The model-based estimator of Cov left parenthesis ModifyingAbove bold italic beta With caret right parenthesis is given by

bold upper Sigma Subscript m Baseline left parenthesis ModifyingAbove bold italic beta With caret right parenthesis equals bold upper I 0 Superscript negative 1

where

bold upper I 0 equals sigma summation Underscript i equals 1 Overscript upper K Endscripts f Subscript i Baseline StartFraction partial differential bold italic mu Subscript i Baseline Over partial differential bold italic beta EndFraction prime bold upper V Subscript i Superscript negative 1 Baseline StartFraction partial differential bold italic mu Subscript i Baseline Over partial differential bold italic beta EndFraction

This is a consistent estimator of the covariance matrix of ModifyingAbove bold italic beta With caret if the mean model and the working correlation matrix are correctly specified.

The estimator

bold upper Sigma Subscript e Baseline equals bold upper I 0 Superscript negative 1 Baseline bold upper I 1 bold upper I 0 Superscript negative 1

is called the empirical, or robust, estimator of the covariance matrix of ModifyingAbove bold italic beta With caret, where

bold upper I 1 equals sigma summation Underscript i equals 1 Overscript upper K Endscripts f Subscript i Baseline StartFraction partial differential bold italic mu Subscript i Baseline Over partial differential bold italic beta EndFraction prime bold upper V Subscript i Superscript negative 1 Baseline normal upper C normal o normal v left parenthesis bold upper Y Subscript i Baseline right parenthesis bold upper V Subscript i Superscript negative 1 Baseline StartFraction partial differential bold italic mu Subscript i Baseline Over partial differential bold italic beta EndFraction

It has the property of being a consistent estimator of the covariance matrix of ModifyingAbove bold italic beta With caret, even if the working correlation matrix is misspecified—that is, if normal upper C normal o normal v left parenthesis bold upper Y Subscript i Baseline right parenthesis not equals bold upper V Subscript i. For further information about the robust variance estimate, see Zeger, Liang, and Albert (1988), Royall (1986), and White (1982). In computing bold upper Sigma Subscript e, the parameters bold-italic beta and phi are replaced by estimates, and normal upper C normal o normal v left parenthesis bold upper Y Subscript i Baseline right parenthesis is replaced by the estimate

left parenthesis bold upper Y Subscript i Baseline minus bold italic mu Subscript i Baseline left parenthesis ModifyingAbove bold italic beta With caret right parenthesis right parenthesis left parenthesis bold upper Y Subscript i Baseline minus bold italic mu Subscript i Baseline left parenthesis ModifyingAbove bold italic beta With caret right parenthesis right parenthesis prime
Quasi-likelihood Information Criterion

The quasi-likelihood information criterion (QIC) was developed by Pan (2001) as a modification of the Akaike information criterion (AIC) to apply to models that are fit by GEEs.

Define the quasi-likelihood under the independence working correlation assumption, which is evaluated using the parameter estimates under the working correlation R as

upper Q left parenthesis ModifyingAbove bold italic beta With caret left parenthesis upper R right parenthesis right parenthesis equals sigma summation Underscript i equals 1 Overscript upper K Endscripts f Subscript i Baseline sigma summation Underscript j equals 1 Overscript n Subscript i Baseline Endscripts upper Q left parenthesis ModifyingAbove bold italic beta With caret left parenthesis upper R right parenthesis semicolon left parenthesis upper Y Subscript i j Baseline comma bold upper X Subscript i j Baseline right parenthesis right parenthesis

where the quasi-likelihood contribution of the jth observation in the ith cluster is defined in the section Quasi-likelihood Functions and ModifyingAbove bold italic beta With caret left parenthesis upper R right parenthesis are the parameter estimates obtained from GEEs that use the working correlation R.

QIC is defined as

upper Q normal upper I normal upper C left parenthesis upper R right parenthesis equals minus 2 upper Q left parenthesis ModifyingAbove bold italic beta With caret left parenthesis upper R right parenthesis right parenthesis plus 2 normal t normal r normal a normal c normal e left parenthesis ModifyingAbove normal upper Omega With caret Subscript upper I Baseline ModifyingAbove upper V With caret Subscript upper R Baseline right parenthesis

where ModifyingAbove upper V With caret Subscript upper R is the robust covariance estimate and ModifyingAbove normal upper Omega With caret Subscript upper I is the inverse of the model-based covariance estimate under the independence working correlation assumption, evaluated at ModifyingAbove bold italic beta With caret left parenthesis upper R right parenthesis, which are the parameter estimates obtained from GEEs that use the working correlation R.

The logistic action also computes an approximation to normal upper Q normal upper I normal upper C left parenthesis upper R right parenthesis, which Pan (2001) defines as

upper Q normal upper I normal upper C Subscript u Baseline left parenthesis upper R right parenthesis equals minus 2 upper Q left parenthesis ModifyingAbove bold italic beta With caret left parenthesis upper R right parenthesis right parenthesis plus 2 p

where p is the number of regression parameters.

Pan (2001) notes that QIC is appropriate for selecting regression models and working correlations, whereas normal upper Q normal upper I normal upper C Subscript u is appropriate only for selecting regression models.

Quasi-likelihood Functions

For detailed discussions of quasi-likelihood functions, see McCullagh and Nelder (1989) and Hardin and Hilbe (2003). The contribution of observation j in cluster i to the quasi-likelihood function that is evaluated at the regression parameters bold-italic beta is given by upper Q left parenthesis bold italic beta semicolon left parenthesis upper Y Subscript i j Baseline comma bold upper X Subscript i j Baseline right parenthesis right parenthesis equals upper Q Subscript i j, where

upper Q Subscript i j Baseline equals w Subscript i j Baseline left bracket r Subscript i j Baseline log left parenthesis p Subscript i j Baseline right parenthesis plus left parenthesis n Subscript i j Baseline minus r Subscript i j Baseline right parenthesis log left parenthesis 1 minus p Subscript i j Baseline right parenthesis right bracket

These are used in the computation of the quasi-likelihood information criteria (QIC) for the goodness of fit of models that are fit using GEEs. The w Subscript i j are prior weights, if any, that are specified in the weight or freq parameter.

GEE analysis is not supported when you perform model selection or when you perform a partitioned analysis.

Within-Subject Effects

If you use multiple variables to define the within-subject effect, the specification of the variables within this effect will have an impact on the analysis if the working correlation type depends on the ordering of the repeated measures. The autoregressive, m-dependent, and unstructured working correlation types use the levelization of the within-subject effect to locate observations within the correlation matrix, and depend on the order that is produced by the levelization.

In particular, if you define the within-subject effect by using the interaction of two or more variables, then the hierarchy of levelization and the resulting ordering of the repeated measures will depend on the order in which you list the variables in the effects subparameter of the repeated parameter.

For example, assume that you are using the interaction of the variables w1 and w2 as the within-subject effect in a repeated measures analysis. Furthermore, assume that w1 takes the values 'A' and 'B' and w2 takes the values 'L' and 'H'. The following statements list these variables in the order w2, w1 in the effects subparameter of the repeated parameter:

  regression.logistic
      table={name='health'},
      class={{vars={'subject', 'w1', 'w2'}}},
      model={depVars={{name='outcome', options={event='1'}}},
             effects={{vars={'age', 'smoke'}}} },
      repeated={{depVars={{name='outcome', options={event='1'}}},
                 effects={{vars={'w2', 'w1'}, interaction='CROSS'}},
                 type='REPEATED',
                 subject={{vars={'subject'} }}, corrtype='MDEP', mdepm=2}};

In this case, the order of the variables in the effects subparameter of the repeated parameter results in a levelization in which the value of w1 changes faster than the value of w2. Table 7 shows this levelization.

Table 7: Levelization Order When Variable Order Is w2, w1

Value of w1 Value of w2 Levelization Order
A L 0
B L 1
A H 2
B H 3


If you reverse the order of the variables in the effects subparameter of the repeated parameter, the levelization has a different hierarchy. In this hierarchy, the value of w2 changes faster than the value of w1. Table 8 shows that the levelization order that is assigned to each unique combination of the variables does not match the levelization order shown in Table 7.

Table 8: Levelization Order When Variable Order Is w1, w2

Value of w1 Value of w2 Levelization Order
A L 0
B L 2
A H 1
B H 3


Because the specified working correlation structure (m-dependent) relies on this ordering, the parameter estimates differ from those that are produced by the ordering shown in Table 7. The autoregressive and unstructured working correlation structures also result in different parameter estimates for the different orderings.

If you use two variables to define the within-subject effect, you can use a nested effect to enforce a specific levelization hierarchy. However, the best practice is to use a single variable in the within-subject effect.

Using a Preexisting Table Order

You can add group and order information to a table by using the partition action in the table action set. If you want to use the group and order information in a table to achieve a reproducible processing order and you also want to do a repeated measures analysis, the group information must follow specific rules. In particular, the variables that you specify in the groupBy parameter of the partition action must match the variables that you specify in the vars sub-subparameter of the subject subparameter of the repeated parameter.

There are some variations to this requirement, depending on which type of subject effect you specify and which other options you use. If you specify a single-variable subject effect or an interaction subject effect, you must use all these variables in the groupBy parameter of the partition action. If you specify a nested subject effect, you must include the nested variable in the groupBy parameter. You can also include the nesting variables in the groupBy parameter, but you cannot include any other variables in the groupBy parameter. However, if you specify the multipass parameter of the logistic action, you must include the nested variable and all nesting variables in the groupBy parameter. You cannot include any other variables in the groupBy parameter.

If you use an order-sensitive working correlation structure, you do not need to use the variables that form the within-subject effect in the orderBy parameter of the partition action. These sets of variables can overlap, but it is not required that the preexisting table order correspond to the ordering for the within-subject effect.

Last updated: March 05, 2026