The statistics that are defined in this section are useful for assessing the fit of the model to your data; they are displayed in the "Fit Statistics" table. The statistics are computed for each data role when you specify a partByFrac or partByVar parameter.
The calculation of the information criteria uses the following formulas, where p denotes the number of effective parameters in the candidate model, F denotes the sum of frequencies used, and l is the log likelihood evaluated at the converged estimates:
If you do not specify a freq parameter, F equals n, the number of observations used.
The goal of a coefficient of determination, also known as an R-square measure, is to express the agreement between a stipulated model and the data in terms of variation in the data that is explained by the model. In linear models, the R-square measure is based on residual sums of squares; because these are additive, a measure bounded between 0 and 1 is easily derived.
In more general models where parameters are estimated by the maximum likelihood principle, Cox and Snell (1989, pp. 208–209) and Magee (1990) proposed the following generalization of the coefficient of determination:
Here, is the likelihood of the intercept-only model,
is the likelihood of the specified model, and n denotes the number of observations used in the analysis. This number is adjusted for frequencies if a
freq parameter is present, and it is based on the trials variable for binomial models.
As discussed in Nagelkerke (1991), this generalized R-square measure has properties similar to those of the coefficient of determination in linear models. If the model effects do not contribute to the analysis, approaches
and
approaches zero.
However, does not have an upper limit of 1. Nagelkerke suggested a rescaled generalized coefficient of determination,
, which achieves an upper limit of 1 by dividing
by its maximum value:
Another measure, from McFadden (1974), is also bounded by 0 and 1:
These measures are most useful for comparing competing models that are not necessarily nested—that is, models that cannot be reduced to one another by simple constraints on the parameter space. Larger values of the measures indicate better models.
The average square error (ASE) is the average of the squared differences between the responses and the predictions. When you have a discrete number of response levels, the ASE is modified as shown in Table 9 (Brier 1950; Murphy 1973); it is also called the Brier score or Brier reliability.
Table 9: Average Square Error Computations
In Table 9, ,
is the number of events,
is the number of trials in binomial response models,
= 1 for events and 0 for nonevents in binary response models, and
is the predicted probability of an event. For polytomous response models specified with single-trial syntax,
= 1 if the ith observation has response level j,
= 1, and
is the model-predicted probability of response level j for observation i. However, if you specify multinomial-trial syntax,
is the number of trials in the ith observation that have response j, and the total number of trials is
.
For a binary response model, write the mean of the model-predicted probabilities of event (Y=0) observations as and of nonevent (Y=1) observations as
, where
is the predicted probability of an event. The difference of means is
, which Tjur (2009) relates to other R-square measures and calls the coefficient of discrimination, because it is a measure of the model’s ability to distinguish between the event and nonevent distributions. The difference of means is also the
or
statistic (with unit standard error) that is discussed in the signal detection literature (McNicol 2005), and it is also referred to as Tjur’s R-square.
For binary response data, the response Y is either an event or a nonevent; let the response Y take the value 1 for an event and 2 for a nonevent. From the fitted model, a predicted event probability can be computed for each observation i. Define your decision rule as follows: if the predicted event probability equals or exceeds a cutpoint value
, the observation is classified as an event; otherwise, it is classified as a nonevent. Suppose
of n individuals experience an event, such as a disease, and the remaining
individuals do not experience that event (are nonevents). The
classification (confusion, decision, error) matrix in Table 10 is obtained by cross-classifying the observed and predicted responses, where
is the total number of observations that are observed to have Y = i and are classified into j. In this table, let Y = 1 denote an observed event and Y = 2 denote a nonevent, and let D = 1 indicate that the observation is classified as an event and D = 2 denote that the observation is classified as a nonevent.
The cells of the classification matrix of Table 10 are as follows:
The accuracy of the classification is measured by its ability to predict events and nonevents correctly. Sensitivity (true positive fraction, TPF, recall) is the proportion of event responses that are predicted to be events. Specificity (true negative fraction, 1–FPF) is the proportion of nonevent responses that are predicted to be nonevents.
You can also measure accuracy by how well the classification predicts the response. The positive predictive value (precision, PPV) is the proportion of observations classified as events that are correctly classified. The negative predictive value (NPV) is the proportion of observations classified as nonevents that are correctly classified. The correct classification rate (accuracy, PC) is the proportion of observations that are correctly classified, whereas the misclassification rate (error rate) is the proportion of observations that are incorrectly classified. The lift is the ratio of the proportion of correctly classified events to the proportion of observations classified as events.
The prevalence, , is the prior probability of an event. If the prevalence is different from the observed empirical event probability in the training data,
, then applying Bayes’ theorem shows that the PPV, NPV, accuracy, PC, misclassification rate, and lift equations depend on the prevalence (Fleiss, Levin, and Paik 2003). For a stratified sampling situation in which
and
are chosen a priori,
is not a desirable estimate of
. You can specify the prevalence by using the
prior subparameter in the model parameter. If you specify a partByFrac or partByVar parameter without specifying the prior subparameter, then the observed empirical probabilities for the training data are used as prevalences for computations of the validation and test statistics.
The logistic action constructs the data for a receiver operating characteristic (ROC) curve by initially rounding the predicted probabilities to the nearest multiple of the value of the binEps parameter. This effectively sorts the observations in increasing order of their estimated event probability. A classification matrix is created for each of these bins by using the rounded probability as the cutpoint. As the cutpoint moves from 0 to 1, those cutpoints for which the classification matrix changes are selected.
Alternatively, if you do not want to generate the entire ROC curve, you can use the cutpt subparameter to specify your own list of cutpoints. In this case, the specified cutpoints are used to generate the classification matrices without rounding the predicted probabilities.
You can specify the ctable parameter to produce a classification table that includes these cutpoints and, for each cutpoint, any of the statistics in Table 11 that you request.
Table 11: Statistics from the Classification Matrix with Cutpoint z
You can output this classification table by specifying the casOut subparameter.
The area under the ROC curve (AUC), as determined by the trapezoidal rule, is given by the concordance index c, which is described in the section Association Statistics.
For more information about the topics in this section, see Pepe (2003).
If you specify the association parameter, the logistic action displays measures of association between predicted probabilities and observed responses for binary, binomial, and ordinal response models. These measures assess the predictive ability of a model.
For ordinal response data, let the predicted mean score of an observation be the sum of the OrderedValue values (shown in the "Response Profile" table) minus one, weighted by the corresponding predicted probabilities for that observation; that is, the predicted means score , where
is the number of response levels and
is the predicted probability of the ith (ordered) response.
For binary and binomial responses, let the predicted mean score be the predicted event probability.
The predicted mean score is rounded to the nearest multiple of the value that you specify in the binEps parameter. This effectively sorts the observations in increasing order of their predicted mean score. Of the n pairs of observations in the data that have different responses, let be the number of pairs where the observation that has the lower-ordered response value has a lower predicted mean score, let
be the number of pairs where the observation that has the lower-ordered response value has a higher predicted mean score, and let
be the rest of the observations. Let N be the sum of observation frequencies in the data. Then the following statistics are reported:
If there are no ties, then Somers’ D (Gini’s coefficient) = 2c – 1. For binary responses, the concordance index, c, is an estimate of the AUC, which is the area under the ROC curve.
If you specify a partByFrac or partByVar parameter, then the logistic action displays a column for each of the roles.