Hosmer and Lemeshow (2000) proposed a statistic that they show, through simulation, is distributed as chi-square when there is no replication in any of the subpopulations—in particular, when you have one or more continuous predictors in your model. Fagerland, Hosmer, and Bofin (2008) and Fagerland and Hosmer (2013, 2016) extend this test to polytomous response models.
The observations are sorted in increasing order of a scored value. For binary response variables, the scored value of an observation is its estimated event probability. The event is the response level that is specified in the event subparameter, the response level that is not specified in the ref subparameter, or, if neither of these options is specified, the response level identified in the "Response Profiles" table as "Ordered Value 1." For nominal response variables (when the value of the link parameter is glogit), the scored value of an observation is 1 minus the estimated probability of the reference level (specified using the ref subparameter). For ordinal response variables, the scored value of an observation is , where K is the number of response levels and
is the predicted probability of the ith ordered response. This scored value is then rounded according to the value of the
binEps parameter.
The observations (and frequencies) are then combined into G groups. By default G = 10, but you can specify in the
nGroups subparameter of the lackfit parameter. For single-trial syntax, observations that have identical scored values are combined and are placed in the same group. Let F be the total frequency. The target frequency for each group is , which is the integer part of
. Load the first group (
) with the observation that has the smallest scored value and with frequency
, and let the next-smallest observation have a frequency of f. The
logistic action performs the following steps for each observation to create the groups:
If the final group has frequency
, then add these observations to the preceding group. The total number of groups that are actually created, g, can be less than G. There must be at least three groups in order for the Hosmer-Lemeshow statistic to be computed.
For binary response variables, the Hosmer-Lemeshow goodness-of-fit statistic is obtained by calculating the Pearson chi-square statistic from the table of observed and expected frequencies, where g is the number of groups. The statistic is written
where is the total frequency of subjects in the jth group,
is the total frequency of event outcomes in the jth group, and
is the average estimated predicted probability of an event outcome for the jth group. The Hosmer-Lemeshow statistic is then compared to a chi-square distribution with
degrees of freedom, where the value of r can be specified in the
dfReduce subparameter of the lackfit parameter. The default is r = 2.
For polytomous response variables, the Pearson chi-square statistic is computed from a table of observed and expected frequencies,
where is the sum of the observed frequencies and
is the sum of the model-predicted probabilities of the observations in group j with response k. The Hosmer-Lemeshow statistic is then compared to a chi-square distribution. The number of degrees of freedom for this test of cumulative and adjacent-category logit models with the equal-slopes assumption is given by Fagerland and Hosmer (2013) and Fagerland and Hosmer (2016) as (g–r)(K–1)+(K–2). The number of degrees of freedom for this test of the generalized logit model is given by Fagerland, Hosmer, and Bofin (2008) as (g–r)(K–1), where K is the number of response levels. The degrees of freedom can also be specified using the
df subparameter of the lackfit parameter.
Large values of (and small p-values) indicate a lack of fit of the model.