Regression Action Set

Predicted Probabilities and Regression Diagnostics

For binary response data, you can produce observationwise predicted probabilities, confidence limits, and regression diagnostics developed by Pregibon (1981) by specifying the output parameter. For multinomial response data, you can likewise produce observationwise predicted probabilities, confidence limits, and raw residuals. If you use multinomial-trial syntax so that each observation represents an aggregate of trials, then extensions of Pregibon’s diagnostics are also available (Lesaffre and Albert 1989; Williams 1987; Gupta, Nguyen, and Pardo 2008; Martín 2015).

For a binary response model, given a vector of covariates bold x Subscript i for the ith observation in your data table and the model-predicted parameter estimates ModifyingAbove bold italic beta With caret, you can write the linear predictor ModifyingAbove eta With caret Subscript i Baseline equals bold x prime Subscript i Baseline ModifyingAbove bold italic beta With caret. The mean of the ith observation mu Subscript i Baseline left parenthesis ModifyingAbove bold italic beta With caret right parenthesis, or the model-predicted event probability ModifyingAbove pi With caret Subscript i, is mu Subscript i Baseline left parenthesis ModifyingAbove bold italic beta With caret right parenthesis equals ModifyingAbove pi With caret Subscript i Baseline equals g Superscript negative 1 Baseline left parenthesis eta Subscript i Baseline right parenthesis, where you choose the link function g by specifying the link subparameter. The variance of the binary distribution is upper V left parenthesis mu right parenthesis equals mu left parenthesis 1 minus mu right parenthesis equals ModifyingAbove pi With caret Subscript i Baseline left parenthesis 1 minus ModifyingAbove pi With caret Subscript i Baseline right parenthesis equals upper V left parenthesis pi right parenthesis, and bold upper Sigma is the estimated covariance of ModifyingAbove bold italic beta With caret. Denote the frequency of the ith observation as f Subscript i and the weight as w Subscript i.

For ordinal response models, the predicted cumulative probabilities are computed in the same fashion by using the appropriate model-predicted intercept parameters ModifyingAbove alpha With caret Subscript j and letting bold-italic beta consist of the slope parameters: ModifyingAbove eta With caret Subscript i j Baseline equals g left parenthesis probability left parenthesis upper Y less than or equals j vertical bar bold x Subscript i Baseline right parenthesis equals ModifyingAbove alpha With caret Subscript j Baseline plus bold x prime Subscript i Baseline ModifyingAbove bold italic beta With caret and ModifyingAbove pi With caret Subscript i j Baseline equals probability left parenthesis upper Y less than or equals j vertical bar bold x Subscript i Baseline right parenthesis equals g Superscript negative 1 Baseline left parenthesis eta Subscript i j Baseline right parenthesis for 1 less than or equals j less than upper J.

For nominal response models, the predicted probabilities are computed by using the appropriate model-predicted intercept parameters ModifyingAbove alpha With caret Subscript j and letting bold italic beta Subscript j consist of the slope parameters: ModifyingAbove eta With caret Subscript i j Baseline equals g left parenthesis probability left parenthesis upper Y equals j vertical bar bold x Subscript i Baseline right parenthesis right parenthesis equals ModifyingAbove alpha With caret Subscript j Baseline plus bold x prime Subscript i Baseline ModifyingAbove bold italic beta Subscript j Baseline With caret and ModifyingAbove pi With caret Subscript i j Baseline equals probability left parenthesis upper Y equals j vertical bar bold x Subscript i Baseline right parenthesis equals g Superscript negative 1 Baseline left parenthesis eta Subscript i j Baseline right parenthesis for 1 less than or equals j less than upper J.

If you perform elastic net or LASSO selection, or if you specify the noxpx or nostderr parameter, then your analysis does not generate a covariance matrix, and you can produce only the predicted probabilities (individual, cumulative, and posterior when available) and the raw, Pearson, deviance, and working residuals.

Confidence Intervals

Approximate confidence intervals for predicted probabilities can be computed as follows. The variance of the linear predictor is estimated by

ModifyingAbove sigma With caret squared left parenthesis eta Subscript i Baseline right parenthesis equals bold x prime Subscript i Baseline bold upper Sigma bold x Subscript i

For multinomial models, the variance also depends on the response function. Let bold italic delta Subscript j be a (J–1) column vector whose jth entry is equal to 1 and all other entries are equal to 0. Redefine bold x Subscript i Baseline equals left parenthesis bold italic delta prime Subscript j Baseline comma bold x Subscript i Superscript prime Baseline right parenthesis prime, eta Subscript i Baseline equals eta Subscript i j, and pi Subscript i Baseline equals pi Subscript i j. Then

ModifyingAbove sigma With caret squared left parenthesis eta Subscript i Baseline right parenthesis equals bold x prime Subscript i Baseline bold upper Sigma bold x Subscript i

The asymptotic 100 left parenthesis 1 minus alpha right parenthesis percent sign confidence interval for eta Subscript i is

ModifyingAbove eta With caret Subscript i Baseline plus or minus z Subscript alpha divided by 2 Baseline ModifyingAbove sigma With caret left parenthesis ModifyingAbove eta With caret Subscript i Baseline right parenthesis

where z Subscript alpha divided by 2 is the 100 left parenthesis 1 minus alpha divided by 2 right parenthesisth percentile point of a standard normal distribution.

The predicted probability and the 100 left parenthesis 1 minus alpha right parenthesis percent sign confidence limits for pi Subscript i are obtained by back-transforming the corresponding measures for the linear predictor. So the confidence limits are

g Superscript negative 1 Baseline left bracket ModifyingAbove eta With caret Subscript i Baseline plus or minus z Subscript alpha divided by 2 Baseline ModifyingAbove sigma With caret left parenthesis ModifyingAbove eta With caret Subscript i Baseline right parenthesis right bracket
Hat-Matrix Diagonals

The diagonal elements of the hat matrix are useful in detecting extreme points in the design space, where they tend to have larger values. For the generalized linear model, the variance of the ith individual observation is

v Subscript i Baseline equals StartFraction upper V left parenthesis pi Subscript i Baseline right parenthesis Over f Subscript i Baseline w Subscript i Baseline EndFraction

For the ith observation, let

w Subscript e i Baseline equals v Subscript i Superscript negative 1 Baseline left parenthesis g prime left parenthesis pi Subscript i Baseline right parenthesis right parenthesis Superscript negative 2

where g prime left parenthesis pi Subscript i Baseline right parenthesis is the derivative of the link function evaluated at pi Subscript i. The weight matrix bold upper W Subscript e is a diagonal matrix, with w Subscript e i denoting the ith diagonal element, which is used in computing the expected information matrix. Define the leverage, or hat-matrix diagonal, h Subscript i, as the ith diagonal element of the matrix

bold upper W Subscript e Superscript one half Baseline bold upper X left parenthesis bold upper X prime bold upper W Subscript e Baseline bold upper X right parenthesis Superscript negative 1 Baseline bold upper X prime bold upper W Subscript e Superscript one half

If the estimated probability is extreme (less than 0.1 and greater than 0.9, approximately), then the hat-matrix diagonal might be greatly reduced in value. Consequently, when an observation has a very large or very small estimated probability, its leverage is not a good indicator of the observation’s distance from the design space (Hosmer and Lemeshow 2000, p. 171).

When you use multinomial-trial syntax to model aggregated multinomial response data and the predProbs subparameter is set to True, the relevant elements of the resulting hat matrix for observation i are a J–1timesJ–1 matrix bold upper H Subscript i. The following are three scalar representations of the leverage based on bold upper H Subscript i and bold upper M Subscript i Baseline equals bold upper I Subscript upper J minus 1 Baseline minus bold upper H Subscript i:

determinant: h Subscript upper D i Baseline equals normal d normal e normal t left parenthesis bold upper M Subscript bold i Baseline right parenthesis
trace: h Subscript upper T i Baseline equals StartFraction 1 Over upper J minus 1 EndFraction normal t normal r normal a normal c normal e left parenthesis bold upper H Subscript i Baseline right parenthesis
potential: h Subscript upper P i Baseline equals StartFraction 1 Over upper J minus 1 EndFraction normal t normal r normal a normal c normal e left parenthesis bold upper M Subscript i Superscript negative 1 Baseline bold upper H Subscript i Baseline right parenthesis

Residuals

Residuals are useful in identifying observations that are not explained well by the model. For binary and binomial response data, the raw residuals are

r Subscript i Baseline equals y Subscript i Baseline divided by t Subscript i Baseline minus ModifyingAbove pi With caret Subscript i

where y Subscript i is the number of event responses out of t Subscript i trials for the ith observation. For single-trial syntax, t Subscript i Baseline equals 1 and y Subscript i Baseline equals 1 if the ordered response is 1 and y Subscript i Baseline equals 0 otherwise.

For multinomial response data, the raw residuals are

r Subscript i j Baseline equals y Subscript i j Baseline divided by y Subscript i Baseline minus ModifyingAbove pi With caret Subscript i j

where ModifyingAbove pi With caret Subscript i j are the model-predicted probabilities of the ordered response j for observation i. With single-trial syntax, y Subscript i j = 1 if the ith observation has ordered response j and y Subscript i j = 0 otherwise, and y Subscript i Baseline equals 1. With multinomial-trial syntax, y Subscript i j is the number of trials in the ith observation which have ordered response j, and y Subscript i Baseline equals sigma summation Underscript j Endscripts y Subscript i j is the total number of trials for the ith observation. If the predProbs subparameter is set to True and you use multinomial-trial syntax, the raw residuals are accumulated across the response levels as follows:

r Subscript i Baseline equals StartRoot sigma summation Underscript j equals 1 Overscript upper J Endscripts y Subscript i j Baseline r Subscript i j Superscript 2 Baseline EndRoot

Pearson residuals are the square root of the ith observation’s contribution to Pearson’s chi-square:

r Subscript upper P i Baseline equals r Subscript i Baseline StartRoot StartFraction w Subscript i Baseline Over upper V left parenthesis pi Subscript i Baseline right parenthesis EndFraction EndRoot

where upper V left parenthesis pi right parenthesis equals ModifyingAbove pi With caret Subscript i Baseline left parenthesis 1 minus ModifyingAbove pi With caret Subscript i Baseline right parenthesis divided by t Subscript i. If you use multinomial-trial syntax, the Pearson residuals for each response level j are

r Subscript upper P i j Baseline equals StartFraction StartRoot w Subscript i Baseline EndRoot y Subscript i Baseline r Subscript i j Baseline Over StartRoot y Subscript i Baseline ModifyingAbove pi With caret Subscript i j Baseline EndRoot EndFraction

and if the predProbs subparameter is set to True, these values are accumulated across the response levels as

r Subscript upper P i Baseline equals StartRoot sigma summation Underscript j equals 1 Overscript upper J Endscripts r Subscript upper P i j Superscript 2 Baseline EndRoot

Deviance residuals are the square root of the contribution of the ith observation to the deviance, with the sign of the raw residual,

r Subscript upper D i Baseline equals normal s normal i normal g normal n left parenthesis r Subscript i Baseline right parenthesis StartRoot d Subscript i Baseline EndRoot

where

d Subscript i Baseline equals 2 w Subscript i Baseline t Subscript i Baseline left bracket y Subscript i Baseline divided by t Subscript i Baseline log left parenthesis StartFraction y Subscript i Baseline divided by t Subscript i Baseline Over ModifyingAbove pi With caret Subscript i Baseline EndFraction right parenthesis plus left parenthesis 1 minus y Subscript i Baseline divided by t Subscript i Baseline right parenthesis log left parenthesis StartFraction 1 minus y Subscript i Baseline divided by t Subscript i Baseline Over 1 minus ModifyingAbove pi With caret Subscript i Baseline EndFraction right parenthesis right bracket

If you use multinomial-trial syntax, the deviance residuals for each response level j are

r Subscript upper D i j Baseline equals normal s normal i normal g normal n left parenthesis r Subscript i j Baseline right parenthesis StartRoot 2 w Subscript i Baseline y Subscript i j Baseline StartAbsoluteValue log left parenthesis StartFraction y Subscript i j Baseline Over y Subscript i Baseline ModifyingAbove pi With caret Subscript i j Baseline EndFraction right parenthesis EndAbsoluteValue EndRoot

and if the predProbs subparameter is set to True, these values are accumulated across the response levels as

r Subscript upper D i Baseline equals plus or minus StartRoot StartAbsoluteValue sigma summation Underscript j equals 1 Overscript upper J Endscripts normal s normal i normal g normal n left parenthesis r Subscript upper D i j Baseline right parenthesis r Subscript upper D i j Superscript 2 Baseline EndAbsoluteValue EndRoot

where plus or minus is the sign of the sum.

The working residuals are

r Subscript upper W i Baseline equals r Subscript i Baseline left parenthesis StartFraction partial differential pi Subscript i Baseline Over partial differential eta Subscript i Baseline EndFraction right parenthesis Superscript negative 1

Working residuals are not available for multinomial-response models when you specify multinomial-trial syntax or the predProbs subparameter.

The Pearson residuals, standardized to have unit asymptotic variance, are

r Subscript upper S upper P i Baseline equals StartFraction r Subscript upper P i Baseline Over StartRoot 1 minus h Subscript i Baseline EndRoot EndFraction

If you use multinomial-trial syntax and the predProbs subparameter is set to True, the standardized Pearson residuals are

r Subscript upper S upper P i Baseline equals StartRoot bold r prime Subscript upper P i minus upper J Baseline bold upper M Subscript i Superscript negative 1 Baseline bold r Subscript upper P i minus upper J Baseline EndRoot

where bold r Subscript upper P i minus upper J Baseline equals left parenthesis r Subscript upper P i Baseline 1 Baseline comma ellipsis comma r Subscript upper P i comma upper J minus 1 Baseline right parenthesis prime.

The deviance residuals, standardized to have unit asymptotic variance, are

r Subscript upper S upper D i Baseline equals StartFraction r Subscript upper D i Baseline Over StartRoot 1 minus h Subscript i Baseline EndRoot EndFraction

If you use multinomial-trial syntax and the predProbs subparameter is set to True, the standardized deviance residuals are

r Subscript upper S upper D i Baseline equals sign left parenthesis bold r prime Subscript upper D i minus upper J Baseline bold upper M Subscript i Superscript negative 1 Baseline bold r Subscript upper D i minus upper J Baseline right parenthesis StartRoot StartAbsoluteValue bold r prime Subscript upper D i minus upper J Baseline bold upper M Subscript i Superscript negative 1 Baseline bold r Subscript upper D i minus upper J Baseline EndAbsoluteValue EndRoot

where bold r Subscript upper D i minus upper J Baseline equals left parenthesis r Subscript upper D i Baseline 1 Baseline comma ellipsis comma r Subscript upper D i comma upper J minus 1 Baseline right parenthesis prime.

The likelihood residuals, which estimate components of a likelihood ratio test of deleting an individual observation, are a weighted combination of the standardized Pearson and deviance residuals:

r Subscript upper L i Baseline equals normal s normal i normal g normal n left parenthesis r Subscript i Baseline right parenthesis StartRoot h Subscript i Baseline r Subscript upper S upper P i Superscript 2 Baseline plus left parenthesis 1 minus h Subscript i Baseline right parenthesis r Subscript upper S upper D i Superscript 2 Baseline EndRoot

If you use multinomial-trial syntax and the predProbs subparameter is set to True, the likelihood residuals are computed as

r Subscript upper L i Baseline equals StartRoot bold r prime Subscript upper P i minus upper J Baseline bold upper H Subscript i Baseline bold upper M Subscript i Superscript negative 1 Baseline bold r Subscript upper P i minus upper J Baseline plus bold r prime Subscript upper D i minus upper J Baseline bold r Subscript upper D i minus upper J Baseline EndRoot equals StartRoot DIFDEV Subscript i Baseline EndRoot
Other Regression Diagnostics

The CBAR statistic is a confidence interval displacement diagnostic that provides a scalar measure of the influence of an individual observation on ModifyingAbove bold italic beta With caret. This diagnostic is based on the same idea as the Cook distance in linear regression theory (Cook and Weisberg 1982), but it uses the one-step estimate:

upper C overbar Subscript i Baseline equals r Subscript upper P i Superscript 2 Baseline h Subscript i Baseline divided by left parenthesis 1 minus h Subscript i Baseline right parenthesis

If you use multinomial-trial syntax and the predProbs subparameter is set to True, this diagnostic is computed as follows:

upper C overbar Subscript i Baseline equals bold r prime Subscript upper P i minus upper J Baseline bold upper M Subscript i Superscript negative 1 Baseline bold upper H Subscript i Baseline bold r Subscript upper P i minus upper J

The DIFDEV and DIFCHISQ statistics are diagnostics for detecting ill-fitted observations—observations that contribute heavily to the disagreement between the data and the predicted values of the fitted model. DIFDEV is the change in the deviance that results from deleting an individual observation, and DIFCHISQ is the change in the Pearson chi-square statistic that results from the same deletion. By using the one-step estimate, DIFDEV and DIFCHISQ for the ith observation are computed as follows:

StartLayout 1st Row 1st Column DIFDEV Subscript i 2nd Column equals 3rd Column r Subscript upper D i Superscript 2 Baseline plus upper C overbar Subscript i 2nd Row 1st Column DIFCHISQ Subscript i 2nd Column equals 3rd Column upper C overbar Subscript i Baseline divided by h Subscript i EndLayout

If you use multinomial-trial syntax and the predProbs subparameter is set to True, these statistics are computed as follows:

StartLayout 1st Row 1st Column DIFDEV Subscript i 2nd Column equals 3rd Column bold r prime Subscript upper D i minus upper J Baseline bold r Subscript upper D i minus upper J Baseline plus upper C overbar Subscript i Baseline equals r Subscript upper L i Superscript 2 Baseline 2nd Row 1st Column DIFCHISQ Subscript i 2nd Column equals 3rd Column bold r prime Subscript upper P i minus upper J Baseline bold r Subscript upper P i minus upper J plus upper C overbar Subscript i EndLayout
Diagnostics for Models Fit by Generalized Estimating Equations (GEEs)

The diagnostic statistics in this section were developed by Preisser and Qaqish (1996). See the section Generalized Estimating Equations for further information and notation for generalized estimating equations (GEEs). The following additional notation is used in this section.

Partition the design matrix bold upper X and response vector bold upper Y by cluster; that is, let bold upper X equals left parenthesis upper X prime 1 comma ellipsis comma upper X Subscript upper K Superscript prime Baseline right parenthesis prime and bold upper Y equals left parenthesis upper Y prime 1 comma ellipsis comma upper Y Subscript upper K Superscript prime Baseline right parenthesis prime, corresponding to the K clusters.

Let n Subscript i be the number of responses for cluster i, and denote the total number of observations as upper N equals sigma summation Underscript i equals 1 Overscript upper K Endscripts n Subscript i. Denote the n Subscript i Baseline times n Subscript i diagonal matrix as upper A Subscript i, where upper V left parenthesis mu Subscript i j Baseline right parenthesis is the jth diagonal element. If there is a weight parameter, the diagonal element of upper A Subscript i is upper V left parenthesis mu Subscript i j Baseline right parenthesis divided by w Subscript i j, where w Subscript i j is the specified weight of the jth observation in the ith cluster. Let bold upper B be the upper N times upper N diagonal matrix with g prime left parenthesis mu Subscript i j Baseline right parenthesis as diagonal elements, i equals 1 comma ellipsis comma upper K, j equals 1 comma ellipsis comma n Subscript i Baseline. Let bold upper B Subscript i be the n Subscript i Baseline times n Subscript i diagonal matrix that corresponds to cluster i, where g prime left parenthesis mu Subscript i j Baseline right parenthesis is the jth diagonal element.

Let bold upper W be the upper N times upper N block diagonal weight matrix whose ith block, corresponding to the ith cluster, is the n Subscript i Baseline times n Subscript i matrix

bold upper W Subscript e i Baseline equals bold upper B Subscript i Superscript negative 1 Baseline bold upper A Subscript i Superscript negative one half Baseline bold upper R Subscript i Superscript negative 1 Baseline left parenthesis ModifyingAbove bold italic alpha With caret right parenthesis bold upper A Subscript i Superscript negative one half Baseline bold upper B Subscript i Superscript negative 1

where bold upper R Subscript i Baseline left parenthesis ModifyingAbove bold italic alpha With caret right parenthesis is the working correlation matrix for cluster i that is computed using the estimated correlation parameters ModifyingAbove bold italic alpha With caret.

Let

bold upper Q Subscript i Baseline equals bold upper X Subscript i Baseline left parenthesis bold upper X prime bold upper W bold upper X right parenthesis Superscript negative 1 Baseline bold upper X prime Subscript i

where bold upper X Subscript i is the n Subscript i Baseline times p design matrix that corresponds to cluster i.

The cluster leverage statistic represents the leverage of cluster i and is contained in the matrix bold upper H Subscript i Baseline equals bold upper Q Subscript i Baseline bold upper W Subscript e i. The leverage of cluster i is summarized by the trace of bold upper H Subscript i:

c h Subscript i Baseline equals normal t normal r left parenthesis bold upper H Subscript i Baseline right parenthesis

The leverage h Subscript t of the tth observation in the ith cluster is the tth diagonal element of bold upper H Subscript i.

Last updated: March 05, 2026