Model Fit and Assessment Statistics

Information Criteria

The calculation of the information criteria uses the following formulas, where p denotes the number of effective parameters in the candidate model, F denotes the sum of frequencies used, and l is the log likelihood evaluated at the converged estimates:

StartLayout 1st Row 1st Column normal upper A normal upper I normal upper C 2nd Column equals minus 2 l plus 2 p 2nd Row 1st Column normal upper A normal upper I normal upper C normal upper C 2nd Column equals StartLayout Enlarged left brace 1st Row 1st Column minus 2 l plus 2 p upper F divided by left parenthesis upper F minus p minus 1 right parenthesis 2nd Column when upper F greater than p plus 2 2nd Row 1st Column minus 2 l plus 2 p left parenthesis p plus 2 right parenthesis 2nd Column otherwise EndLayout 3rd Row 1st Column normal upper S normal upper B normal upper C 2nd Column equals minus 2 l plus p log left parenthesis upper F right parenthesis EndLayout

If you do not specify a freq parameter, F equals n, the number of observations used.

Generalized Coefficient of Determination

The goal of a coefficient of determination, also known as an R-square measure, is to express the agreement between a stipulated model and the data in terms of variation in the data that is explained by the model. In linear models, the R-square measure is based on residual sums of squares; because these are additive, a measure bounded between 0 and 1 is easily derived.

In more general models where parameters are estimated by the maximum likelihood principle, Cox and Snell (1989, pp. 208–209) and Magee (1990) proposed the following generalization of the coefficient of determination:

upper R squared equals 1 minus left brace StartFraction upper L left parenthesis bold 0 right parenthesis Over upper L left parenthesis ModifyingAbove bold italic beta With caret right parenthesis EndFraction right brace Superscript StartFraction 2 Over n EndFraction

Here, is the likelihood of the intercept-only model, is the likelihood of the specified model, and n denotes the number of observations used in the analysis. This number is adjusted for frequencies if a freq parameter is present, and it is based on the trials variable for binomial models.

As discussed in Nagelkerke (1991), this generalized R-square measure has properties similar to those of the coefficient of determination in linear models. If the model effects do not contribute to the analysis, approaches and approaches zero.

However, does not have an upper limit of 1. Nagelkerke suggested a rescaled generalized coefficient of determination, , which achieves an upper limit of 1 by dividing by its maximum value:

StartLayout 1st Row 1st Column upper R Subscript max Superscript 2 2nd Column equals 1 minus left brace upper L left parenthesis bold 0 right parenthesis right brace Superscript StartFraction 2 Over n EndFraction Baseline 2nd Row 1st Column upper R Subscript upper N Superscript 2 2nd Column equals StartFraction upper R squared Over upper R Subscript max Superscript 2 Baseline EndFraction EndLayout

Another measure, from McFadden (1974), is also bounded by 0 and 1:

upper R Subscript upper M Superscript 2 Baseline equals 1 minus left parenthesis StartFraction log upper L left parenthesis ModifyingAbove bold italic beta With caret right parenthesis Over log upper L left parenthesis bold 0 right parenthesis EndFraction right parenthesis

These measures are most useful for comparing competing models that are not necessarily nested—that is, models that cannot be reduced to one another by simple constraints on the parameter space. Larger values of the measures indicate better models.

Average Square Error

The average square error (ASE) is the average of the squared differences between the responses and the predictions. When you have a discrete number of response levels, the ASE is modified as shown in Table 9 (Brier 1950; Murphy 1973); it is also called the Brier score or Brier reliability.

Table 9: Average Square Error Computations

Response Type	ASE (Brier Score)
Binary
Binomial
Multinomial

In Table 9, , is the number of events, is the number of trials in binomial response models, = 1 for events and 0 for nonevents in binary response models, and is the predicted probability of an event. For polytomous response models specified with single-trial syntax, = 1 if the ith observation has response level j, = 1, and is the model-predicted probability of response level j for observation i. However, if you specify multinomial-trial syntax, is the number of trials in the ith observation that have response j, and the total number of trials is .

Difference of Means

For a binary response model, write the mean of the model-predicted probabilities of event (Y=0) observations as and of nonevent (Y=1) observations as , where is the predicted probability of an event. The difference of means is , which Tjur (2009) relates to other R-square measures and calls the coefficient of discrimination, because it is a measure of the model’s ability to distinguish between the event and nonevent distributions. The difference of means is also the or statistic (with unit standard error) that is discussed in the signal detection literature (McNicol 2005), and it is also referred to as Tjur’s R-square.

Classification Table and ROC Curves

For binary response data, the response Y is either an event or a nonevent; let the response Y take the value 1 for an event and 2 for a nonevent. From the fitted model, a predicted event probability can be computed for each observation i. Define your decision rule as follows: if the predicted event probability equals or exceeds a cutpoint value , the observation is classified as an event; otherwise, it is classified as a nonevent. Suppose of n individuals experience an event, such as a disease, and the remaining individuals do not experience that event (are nonevents). The classification (confusion, decision, error) matrix in Table 10 is obtained by cross-classifying the observed and predicted responses, where is the total number of observations that are observed to have Y = i and are classified into j. In this table, let Y = 1 denote an observed event and Y = 2 denote a nonevent, and let D = 1 indicate that the observation is classified as an event and D = 2 denote that the observation is classified as a nonevent.

Table 10: Classification Matrix

	()	()	Total
(event)
(nonevent)

The cells of the classification matrix of Table 10 are as follows:

	=	the number of true positives, which is the number of event observations that are correctly classified as events
	=	the number of false positives, which is the number of nonevent observations that are incorrectly classified as events
	=	the number of false negatives, which is the number of event observations that are incorrectly classified as nonevents
	=	the number of true negatives, which is the number of nonevent observations that are correctly classified as nonevents
	=	the total number of actual events
	=	the total number of actual nonevents

The accuracy of the classification is measured by its ability to predict events and nonevents correctly. Sensitivity (true positive fraction, TPF, recall) is the proportion of event responses that are predicted to be events. Specificity (true negative fraction, 1–FPF) is the proportion of nonevent responses that are predicted to be nonevents.

You can also measure accuracy by how well the classification predicts the response. The positive predictive value (precision, PPV) is the proportion of observations classified as events that are correctly classified. The negative predictive value (NPV) is the proportion of observations classified as nonevents that are correctly classified. The correct classification rate (accuracy, PC) is the proportion of observations that are correctly classified, whereas the misclassification rate (error rate) is the proportion of observations that are incorrectly classified. The lift is the ratio of the proportion of correctly classified events to the proportion of observations classified as events.

The prevalence, , is the prior probability of an event. If the prevalence is different from the observed empirical event probability in the training data, , then applying Bayes’ theorem shows that the PPV, NPV, accuracy, PC, misclassification rate, and lift equations depend on the prevalence (Fleiss, Levin, and Paik 2003). For a stratified sampling situation in which and are chosen a priori, is not a desirable estimate of . You can specify the prevalence by using the prior subparameter in the model parameter. If you specify a partByFrac or partByVar parameter without specifying the prior subparameter, then the observed empirical probabilities for the training data are used as prevalences for computations of the validation and test statistics.

The logistic action constructs the data for a receiver operating characteristic (ROC) curve by initially rounding the predicted probabilities to the nearest multiple of the value of the binEps parameter. This effectively sorts the observations in increasing order of their estimated event probability. A classification matrix is created for each of these bins by using the rounded probability as the cutpoint. As the cutpoint moves from 0 to 1, those cutpoints for which the classification matrix changes are selected.

Alternatively, if you do not want to generate the entire ROC curve, you can use the cutpt subparameter to specify your own list of cutpoints. In this case, the specified cutpoints are used to generate the classification matrices without rounding the predicted probabilities.

You can specify the ctable parameter to produce a classification table that includes these cutpoints and, for each cutpoint, any of the statistics in Table 11 that you request.

Table 11: Statistics from the Classification Matrix with Cutpoint z

Statistic	Equation	Column Name
Cutpoint	z	ProbLevel
Number of true positives		TruePos
Number of true negatives		TrueNeg
Number of false positives		FalsePos
Number of false negatives		FalseNeg
True positive fraction (sensitivity)		TPF
False positive fraction (1–specificity)		FPF
True negative fraction		TNF
False negative fraction		FNF
Correct classification rate		Accuracy
Percentage correct (PC)	100Accuracy	PC
Misclassification rate	1 – Accuracy	Misclass
Positive predictive value		PPV
Negative predictive value		NPV
Lift	TPF	Lift

You can output this classification table by specifying the casOut subparameter.

The area under the ROC curve (AUC), as determined by the trapezoidal rule, is given by the concordance index c, which is described in the section Association Statistics.

For more information about the topics in this section, see Pepe (2003).

Association Statistics

If you specify the association parameter, the logistic action displays measures of association between predicted probabilities and observed responses for binary, binomial, and ordinal response models. These measures assess the predictive ability of a model.

For ordinal response data, let the predicted mean score of an observation be the sum of the OrderedValue values (shown in the "Response Profile" table) minus one, weighted by the corresponding predicted probabilities for that observation; that is, the predicted means score , where is the number of response levels and is the predicted probability of the ith (ordered) response.

For binary and binomial responses, let the predicted mean score be the predicted event probability.

The predicted mean score is rounded to the nearest multiple of the value that you specify in the binEps parameter. This effectively sorts the observations in increasing order of their predicted mean score. Of the n pairs of observations in the data that have different responses, let be the number of pairs where the observation that has the lower-ordered response value has a lower predicted mean score, let be the number of pairs where the observation that has the lower-ordered response value has a higher predicted mean score, and let be the rest of the observations. Let N be the sum of observation frequencies in the data. Then the following statistics are reported:

StartLayout 1st Row 1st Column concordance index c left parenthesis AUC right parenthesis 2nd Column equals 3rd Column left parenthesis n Subscript c Baseline plus 0.5 n Subscript t Baseline right parenthesis divided by n 2nd Row 1st Column Somers prime upper D left parenthesis Gini coefficient right parenthesis 2nd Column equals 3rd Column left parenthesis n Subscript c Baseline minus n Subscript d Baseline right parenthesis divided by n 3rd Row 1st Column Goodman hyphen Kruskal gamma 2nd Column equals 3rd Column left parenthesis n Subscript c Baseline minus n Subscript d Baseline right parenthesis divided by left parenthesis n Subscript c Baseline plus n Subscript d Baseline right parenthesis 4th Row 1st Column Kendall prime s tau hyphen a 2nd Column equals 3rd Column left parenthesis n Subscript c Baseline minus n Subscript d Baseline right parenthesis divided by left parenthesis 0.5 upper N left parenthesis upper N minus 1 right parenthesis right parenthesis EndLayout

If there are no ties, then Somers’ D (Gini’s coefficient) = 2c – 1. For binary responses, the concordance index, c, is an estimate of the AUC, which is the area under the ROC curve.

If you specify a partByFrac or partByVar parameter, then the logistic action displays a column for each of the roles.

Regression Action Set