Regression Action Set

Predicted Probabilities and Regression Diagnostics

For binary response data, you can produce observationwise predicted probabilities, confidence limits, and regression diagnostics developed by Pregibon (1981) by specifying the output parameter. For multinomial response data, you can likewise produce observationwise predicted probabilities, confidence limits, and raw residuals. If you use multinomial-trial syntax so that each observation represents an aggregate of trials, then extensions of Pregibon’s diagnostics are also available (Lesaffre and Albert 1989; Williams 1987; Gupta, Nguyen, and Pardo 2008; Martín 2015).

For a binary response model, given a vector of covariates for the ith observation in your data table and the model-predicted parameter estimates , you can write the linear predictor . The mean of the ith observation , or the model-predicted event probability , is , where you choose the link function g by specifying the link subparameter. The variance of the binary distribution is , and is the estimated covariance of . Denote the frequency of the ith observation as and the weight as .

For ordinal response models, the predicted cumulative probabilities are computed in the same fashion by using the appropriate model-predicted intercept parameters and letting consist of the slope parameters: and for .

For nominal response models, the predicted probabilities are computed by using the appropriate model-predicted intercept parameters and letting consist of the slope parameters: and for .

If you perform elastic net or LASSO selection, or if you specify the noxpx or nostderr parameter, then your analysis does not generate a covariance matrix, and you can produce only the predicted probabilities (individual, cumulative, and posterior when available) and the raw, Pearson, deviance, and working residuals.

Confidence Intervals

Approximate confidence intervals for predicted probabilities can be computed as follows. The variance of the linear predictor is estimated by

ModifyingAbove sigma With caret squared left parenthesis eta Subscript i Baseline right parenthesis equals bold x prime Subscript i Baseline bold upper Sigma bold x Subscript i

For multinomial models, the variance also depends on the response function. Let be a (J–1) column vector whose jth entry is equal to 1 and all other entries are equal to 0. Redefine , , and . Then

The asymptotic confidence interval for is

ModifyingAbove eta With caret Subscript i Baseline plus or minus z Subscript alpha divided by 2 Baseline ModifyingAbove sigma With caret left parenthesis ModifyingAbove eta With caret Subscript i Baseline right parenthesis

where is the th percentile point of a standard normal distribution.

The predicted probability and the confidence limits for are obtained by back-transforming the corresponding measures for the linear predictor. So the confidence limits are

g Superscript negative 1 Baseline left bracket ModifyingAbove eta With caret Subscript i Baseline plus or minus z Subscript alpha divided by 2 Baseline ModifyingAbove sigma With caret left parenthesis ModifyingAbove eta With caret Subscript i Baseline right parenthesis right bracket

Hat-Matrix Diagonals

The diagonal elements of the hat matrix are useful in detecting extreme points in the design space, where they tend to have larger values. For the generalized linear model, the variance of the ith individual observation is

v Subscript i Baseline equals StartFraction upper V left parenthesis pi Subscript i Baseline right parenthesis Over f Subscript i Baseline w Subscript i Baseline EndFraction

For the ith observation, let

w Subscript e i Baseline equals v Subscript i Superscript negative 1 Baseline left parenthesis g prime left parenthesis pi Subscript i Baseline right parenthesis right parenthesis Superscript negative 2

where is the derivative of the link function evaluated at . The weight matrix is a diagonal matrix, with denoting the ith diagonal element, which is used in computing the expected information matrix. Define the leverage, or hat-matrix diagonal, , as the ith diagonal element of the matrix

bold upper W Subscript e Superscript one half Baseline bold upper X left parenthesis bold upper X prime bold upper W Subscript e Baseline bold upper X right parenthesis Superscript negative 1 Baseline bold upper X prime bold upper W Subscript e Superscript one half

If the estimated probability is extreme (less than 0.1 and greater than 0.9, approximately), then the hat-matrix diagonal might be greatly reduced in value. Consequently, when an observation has a very large or very small estimated probability, its leverage is not a good indicator of the observation’s distance from the design space (Hosmer and Lemeshow 2000, p. 171).

When you use multinomial-trial syntax to model aggregated multinomial response data and the predProbs subparameter is set to True, the relevant elements of the resulting hat matrix for observation i are a J–1J–1 matrix . The following are three scalar representations of the leverage based on and :

determinant:
trace:
potential:

Residuals

Residuals are useful in identifying observations that are not explained well by the model. For binary and binomial response data, the raw residuals are

r Subscript i Baseline equals y Subscript i Baseline divided by t Subscript i Baseline minus ModifyingAbove pi With caret Subscript i

where is the number of event responses out of trials for the ith observation. For single-trial syntax, and if the ordered response is 1 and otherwise.

For multinomial response data, the raw residuals are

r Subscript i j Baseline equals y Subscript i j Baseline divided by y Subscript i Baseline minus ModifyingAbove pi With caret Subscript i j

where are the model-predicted probabilities of the ordered response j for observation i. With single-trial syntax, = 1 if the ith observation has ordered response j and = 0 otherwise, and . With multinomial-trial syntax, is the number of trials in the ith observation which have ordered response j, and is the total number of trials for the ith observation. If the predProbs subparameter is set to True and you use multinomial-trial syntax, the raw residuals are accumulated across the response levels as follows:

r Subscript i Baseline equals StartRoot sigma summation Underscript j equals 1 Overscript upper J Endscripts y Subscript i j Baseline r Subscript i j Superscript 2 Baseline EndRoot

Pearson residuals are the square root of the ith observation’s contribution to Pearson’s chi-square:

r Subscript upper P i Baseline equals r Subscript i Baseline StartRoot StartFraction w Subscript i Baseline Over upper V left parenthesis pi Subscript i Baseline right parenthesis EndFraction EndRoot

where . If you use multinomial-trial syntax, the Pearson residuals for each response level j are

r Subscript upper P i j Baseline equals StartFraction StartRoot w Subscript i Baseline EndRoot y Subscript i Baseline r Subscript i j Baseline Over StartRoot y Subscript i Baseline ModifyingAbove pi With caret Subscript i j Baseline EndRoot EndFraction

and if the predProbs subparameter is set to True, these values are accumulated across the response levels as

r Subscript upper P i Baseline equals StartRoot sigma summation Underscript j equals 1 Overscript upper J Endscripts r Subscript upper P i j Superscript 2 Baseline EndRoot

Deviance residuals are the square root of the contribution of the ith observation to the deviance, with the sign of the raw residual,

r Subscript upper D i Baseline equals normal s normal i normal g normal n left parenthesis r Subscript i Baseline right parenthesis StartRoot d Subscript i Baseline EndRoot

where

d Subscript i Baseline equals 2 w Subscript i Baseline t Subscript i Baseline left bracket y Subscript i Baseline divided by t Subscript i Baseline log left parenthesis StartFraction y Subscript i Baseline divided by t Subscript i Baseline Over ModifyingAbove pi With caret Subscript i Baseline EndFraction right parenthesis plus left parenthesis 1 minus y Subscript i Baseline divided by t Subscript i Baseline right parenthesis log left parenthesis StartFraction 1 minus y Subscript i Baseline divided by t Subscript i Baseline Over 1 minus ModifyingAbove pi With caret Subscript i Baseline EndFraction right parenthesis right bracket

If you use multinomial-trial syntax, the deviance residuals for each response level j are

r Subscript upper D i j Baseline equals normal s normal i normal g normal n left parenthesis r Subscript i j Baseline right parenthesis StartRoot 2 w Subscript i Baseline y Subscript i j Baseline StartAbsoluteValue log left parenthesis StartFraction y Subscript i j Baseline Over y Subscript i Baseline ModifyingAbove pi With caret Subscript i j Baseline EndFraction right parenthesis EndAbsoluteValue EndRoot

and if the predProbs subparameter is set to True, these values are accumulated across the response levels as

r Subscript upper D i Baseline equals plus or minus StartRoot StartAbsoluteValue sigma summation Underscript j equals 1 Overscript upper J Endscripts normal s normal i normal g normal n left parenthesis r Subscript upper D i j Baseline right parenthesis r Subscript upper D i j Superscript 2 Baseline EndAbsoluteValue EndRoot

where is the sign of the sum.

The working residuals are

r Subscript upper W i Baseline equals r Subscript i Baseline left parenthesis StartFraction partial differential pi Subscript i Baseline Over partial differential eta Subscript i Baseline EndFraction right parenthesis Superscript negative 1

Working residuals are not available for multinomial-response models when you specify multinomial-trial syntax or the predProbs subparameter.

The Pearson residuals, standardized to have unit asymptotic variance, are

r Subscript upper S upper P i Baseline equals StartFraction r Subscript upper P i Baseline Over StartRoot 1 minus h Subscript i Baseline EndRoot EndFraction

If you use multinomial-trial syntax and the predProbs subparameter is set to True, the standardized Pearson residuals are

r Subscript upper S upper P i Baseline equals StartRoot bold r prime Subscript upper P i minus upper J Baseline bold upper M Subscript i Superscript negative 1 Baseline bold r Subscript upper P i minus upper J Baseline EndRoot

where .

The deviance residuals, standardized to have unit asymptotic variance, are

r Subscript upper S upper D i Baseline equals StartFraction r Subscript upper D i Baseline Over StartRoot 1 minus h Subscript i Baseline EndRoot EndFraction

If you use multinomial-trial syntax and the predProbs subparameter is set to True, the standardized deviance residuals are

r Subscript upper S upper D i Baseline equals sign left parenthesis bold r prime Subscript upper D i minus upper J Baseline bold upper M Subscript i Superscript negative 1 Baseline bold r Subscript upper D i minus upper J Baseline right parenthesis StartRoot StartAbsoluteValue bold r prime Subscript upper D i minus upper J Baseline bold upper M Subscript i Superscript negative 1 Baseline bold r Subscript upper D i minus upper J Baseline EndAbsoluteValue EndRoot

where .

The likelihood residuals, which estimate components of a likelihood ratio test of deleting an individual observation, are a weighted combination of the standardized Pearson and deviance residuals:

r Subscript upper L i Baseline equals normal s normal i normal g normal n left parenthesis r Subscript i Baseline right parenthesis StartRoot h Subscript i Baseline r Subscript upper S upper P i Superscript 2 Baseline plus left parenthesis 1 minus h Subscript i Baseline right parenthesis r Subscript upper S upper D i Superscript 2 Baseline EndRoot

If you use multinomial-trial syntax and the predProbs subparameter is set to True, the likelihood residuals are computed as

r Subscript upper L i Baseline equals StartRoot bold r prime Subscript upper P i minus upper J Baseline bold upper H Subscript i Baseline bold upper M Subscript i Superscript negative 1 Baseline bold r Subscript upper P i minus upper J Baseline plus bold r prime Subscript upper D i minus upper J Baseline bold r Subscript upper D i minus upper J Baseline EndRoot equals StartRoot DIFDEV Subscript i Baseline EndRoot

Other Regression Diagnostics

The CBAR statistic is a confidence interval displacement diagnostic that provides a scalar measure of the influence of an individual observation on . This diagnostic is based on the same idea as the Cook distance in linear regression theory (Cook and Weisberg 1982), but it uses the one-step estimate:

upper C overbar Subscript i Baseline equals r Subscript upper P i Superscript 2 Baseline h Subscript i Baseline divided by left parenthesis 1 minus h Subscript i Baseline right parenthesis

If you use multinomial-trial syntax and the predProbs subparameter is set to True, this diagnostic is computed as follows:

upper C overbar Subscript i Baseline equals bold r prime Subscript upper P i minus upper J Baseline bold upper M Subscript i Superscript negative 1 Baseline bold upper H Subscript i Baseline bold r Subscript upper P i minus upper J

The DIFDEV and DIFCHISQ statistics are diagnostics for detecting ill-fitted observations—observations that contribute heavily to the disagreement between the data and the predicted values of the fitted model. DIFDEV is the change in the deviance that results from deleting an individual observation, and DIFCHISQ is the change in the Pearson chi-square statistic that results from the same deletion. By using the one-step estimate, DIFDEV and DIFCHISQ for the ith observation are computed as follows:

StartLayout 1st Row 1st Column DIFDEV Subscript i 2nd Column equals 3rd Column r Subscript upper D i Superscript 2 Baseline plus upper C overbar Subscript i 2nd Row 1st Column DIFCHISQ Subscript i 2nd Column equals 3rd Column upper C overbar Subscript i Baseline divided by h Subscript i EndLayout

If you use multinomial-trial syntax and the predProbs subparameter is set to True, these statistics are computed as follows:

StartLayout 1st Row 1st Column DIFDEV Subscript i 2nd Column equals 3rd Column bold r prime Subscript upper D i minus upper J Baseline bold r Subscript upper D i minus upper J Baseline plus upper C overbar Subscript i Baseline equals r Subscript upper L i Superscript 2 Baseline 2nd Row 1st Column DIFCHISQ Subscript i 2nd Column equals 3rd Column bold r prime Subscript upper P i minus upper J Baseline bold r Subscript upper P i minus upper J plus upper C overbar Subscript i EndLayout

Diagnostics for Models Fit by Generalized Estimating Equations (GEEs)

The diagnostic statistics in this section were developed by Preisser and Qaqish (1996). See the section Generalized Estimating Equations for further information and notation for generalized estimating equations (GEEs). The following additional notation is used in this section.

Partition the design matrix and response vector by cluster; that is, let and , corresponding to the K clusters.

Let be the number of responses for cluster i, and denote the total number of observations as . Denote the diagonal matrix as , where is the jth diagonal element. If there is a weight parameter, the diagonal element of is , where is the specified weight of the jth observation in the ith cluster. Let be the diagonal matrix with as diagonal elements, , . Let be the diagonal matrix that corresponds to cluster i, where is the jth diagonal element.

Let be the block diagonal weight matrix whose ith block, corresponding to the ith cluster, is the matrix

bold upper W Subscript e i Baseline equals bold upper B Subscript i Superscript negative 1 Baseline bold upper A Subscript i Superscript negative one half Baseline bold upper R Subscript i Superscript negative 1 Baseline left parenthesis ModifyingAbove bold italic alpha With caret right parenthesis bold upper A Subscript i Superscript negative one half Baseline bold upper B Subscript i Superscript negative 1

where is the working correlation matrix for cluster i that is computed using the estimated correlation parameters .

Let

bold upper Q Subscript i Baseline equals bold upper X Subscript i Baseline left parenthesis bold upper X prime bold upper W bold upper X right parenthesis Superscript negative 1 Baseline bold upper X prime Subscript i

where is the design matrix that corresponds to cluster i.

The cluster leverage statistic represents the leverage of cluster i and is contained in the matrix . The leverage of cluster i is summarized by the trace of :

c h Subscript i Baseline equals normal t normal r left parenthesis bold upper H Subscript i Baseline right parenthesis

The leverage of the tth observation in the ith cluster is the tth diagonal element of .

Last updated: March 05, 2026