For binary response data, you can produce observationwise predicted probabilities, confidence limits, and regression diagnostics developed by Pregibon (1981) by specifying the output parameter. For multinomial response data, you can likewise produce observationwise predicted probabilities, confidence limits, and raw residuals. If you use multinomial-trial syntax so that each observation represents an aggregate of trials, then extensions of Pregibon’s diagnostics are also available (Lesaffre and Albert 1989; Williams 1987; Gupta, Nguyen, and Pardo 2008; Martín 2015).
For a binary response model, given a vector of covariates for the ith observation in your data table and the model-predicted parameter estimates
, you can write the linear predictor
. The mean of the ith observation
, or the model-predicted event probability
, is
, where you choose the link function g by specifying the
link subparameter. The variance of the binary distribution is , and
is the estimated covariance of
. Denote the frequency of the ith observation as
and the weight as
.
For ordinal response models, the predicted cumulative probabilities are computed in the same fashion by using the appropriate model-predicted intercept parameters and letting
consist of the slope parameters:
and
for
.
For nominal response models, the predicted probabilities are computed by using the appropriate model-predicted intercept parameters and letting
consist of the slope parameters:
and
for
.
If you perform elastic net or LASSO selection, or if you specify the noxpx or nostderr parameter, then your analysis does not generate a covariance matrix, and you can produce only the predicted probabilities (individual, cumulative, and posterior when available) and the raw, Pearson, deviance, and working residuals.
Approximate confidence intervals for predicted probabilities can be computed as follows. The variance of the linear predictor is estimated by
For multinomial models, the variance also depends on the response function. Let be a (J–1) column vector whose jth entry is equal to 1 and all other entries are equal to 0. Redefine
,
, and
. Then
The asymptotic confidence interval for
is
where is the
th percentile point of a standard normal distribution.
The predicted probability and the confidence limits for
are obtained by back-transforming the corresponding measures for the linear predictor. So the confidence limits are
The diagonal elements of the hat matrix are useful in detecting extreme points in the design space, where they tend to have larger values. For the generalized linear model, the variance of the ith individual observation is
For the ith observation, let
where is the derivative of the link function evaluated at
. The weight matrix
is a diagonal matrix, with
denoting the ith diagonal element, which is used in computing the expected information matrix. Define the leverage, or hat-matrix diagonal,
, as the ith diagonal element of the matrix
If the estimated probability is extreme (less than 0.1 and greater than 0.9, approximately), then the hat-matrix diagonal might be greatly reduced in value. Consequently, when an observation has a very large or very small estimated probability, its leverage is not a good indicator of the observation’s distance from the design space (Hosmer and Lemeshow 2000, p. 171).
When you use multinomial-trial syntax to model aggregated multinomial response data and the predProbs subparameter is set to True, the relevant elements of the resulting hat matrix for observation i are a J–1J–1 matrix
. The following are three scalar representations of the leverage based on
and
:
Residuals are useful in identifying observations that are not explained well by the model. For binary and binomial response data, the raw residuals are
where is the number of event responses out of
trials for the ith observation. For single-trial syntax,
and
if the ordered response is 1 and
otherwise.
For multinomial response data, the raw residuals are
where are the model-predicted probabilities of the ordered response j for observation i. With single-trial syntax,
= 1 if the ith observation has ordered response j and
= 0 otherwise, and
. With multinomial-trial syntax,
is the number of trials in the ith observation which have ordered response j, and
is the total number of trials for the ith observation. If the
predProbs subparameter is set to True and you use multinomial-trial syntax, the raw residuals are accumulated across the response levels as follows:
Pearson residuals are the square root of the ith observation’s contribution to Pearson’s chi-square:
where . If you use multinomial-trial syntax, the Pearson residuals for each response level j are
and if the predProbs subparameter is set to True, these values are accumulated across the response levels as
Deviance residuals are the square root of the contribution of the ith observation to the deviance, with the sign of the raw residual,
where
If you use multinomial-trial syntax, the deviance residuals for each response level j are
and if the predProbs subparameter is set to True, these values are accumulated across the response levels as
The working residuals are
Working residuals are not available for multinomial-response models when you specify multinomial-trial syntax or the predProbs subparameter.
The Pearson residuals, standardized to have unit asymptotic variance, are
If you use multinomial-trial syntax and the predProbs subparameter is set to True, the standardized Pearson residuals are
The deviance residuals, standardized to have unit asymptotic variance, are
If you use multinomial-trial syntax and the predProbs subparameter is set to True, the standardized deviance residuals are
The likelihood residuals, which estimate components of a likelihood ratio test of deleting an individual observation, are a weighted combination of the standardized Pearson and deviance residuals:
If you use multinomial-trial syntax and the predProbs subparameter is set to True, the likelihood residuals are computed as
The CBAR statistic is a confidence interval displacement diagnostic that provides a scalar measure of the influence of an individual observation on . This diagnostic is based on the same idea as the Cook distance in linear regression theory (Cook and Weisberg 1982), but it uses the one-step estimate:
If you use multinomial-trial syntax and the predProbs subparameter is set to True, this diagnostic is computed as follows:
The DIFDEV and DIFCHISQ statistics are diagnostics for detecting ill-fitted observations—observations that contribute heavily to the disagreement between the data and the predicted values of the fitted model. DIFDEV is the change in the deviance that results from deleting an individual observation, and DIFCHISQ is the change in the Pearson chi-square statistic that results from the same deletion. By using the one-step estimate, DIFDEV and DIFCHISQ for the ith observation are computed as follows:
If you use multinomial-trial syntax and the predProbs subparameter is set to True, these statistics are computed as follows:
The diagnostic statistics in this section were developed by Preisser and Qaqish (1996). See the section Generalized Estimating Equations for further information and notation for generalized estimating equations (GEEs). The following additional notation is used in this section.
Partition the design matrix and response vector
by cluster; that is, let
and
, corresponding to the K clusters.
Let be the number of responses for cluster i, and denote the total number of observations as
. Denote the
diagonal matrix as
, where
is the jth diagonal element. If there is a
weight parameter, the diagonal element of is
, where
is the specified weight of the jth observation in the ith cluster. Let
be the
diagonal matrix with
as diagonal elements,
,
. Let
be the
diagonal matrix that corresponds to cluster i, where
is the jth diagonal element.
Let be the
block diagonal weight matrix whose ith block, corresponding to the ith cluster, is the
matrix
where is the working correlation matrix for cluster i that is computed using the estimated correlation parameters
.
Let
where is the
design matrix that corresponds to cluster i.
The cluster leverage statistic represents the leverage of cluster i and is contained in the matrix . The leverage of cluster i is summarized by the trace of
:
The leverage of the tth observation in the ith cluster is the tth diagonal element of
.