Regression Action Set

The Hosmer-Lemeshow Goodness-of-Fit Test

Hosmer and Lemeshow (2000) proposed a statistic that they show, through simulation, is distributed as chi-square when there is no replication in any of the subpopulations—in particular, when you have one or more continuous predictors in your model. Fagerland, Hosmer, and Bofin (2008) and Fagerland and Hosmer (2013, 2016) extend this test to polytomous response models.

The observations are sorted in increasing order of a scored value. For binary response variables, the scored value of an observation is its estimated event probability. The event is the response level that is specified in the event subparameter, the response level that is not specified in the ref subparameter, or, if neither of these options is specified, the response level identified in the "Response Profiles" table as "Ordered Value 1." For nominal response variables (when the value of the link parameter is glogit), the scored value of an observation is 1 minus the estimated probability of the reference level (specified using the ref subparameter). For ordinal response variables, the scored value of an observation is sigma summation Underscript i equals 1 Overscript upper K Endscripts i ModifyingAbove pi With caret Subscript i, where K is the number of response levels and ModifyingAbove pi With caret Subscript i is the predicted probability of the ith ordered response. This scored value is then rounded according to the value of the binEps parameter.

The observations (and frequencies) are then combined into G groups. By default G = 10, but you can specify upper G greater than or equals 5 in the nGroups subparameter of the lackfit parameter. For single-trial syntax, observations that have identical scored values are combined and are placed in the same group. Let F be the total frequency. The target frequency for each group is upper T equals left floor upper F divided by upper G plus 0.5 right floor, which is the integer part of upper F divided by upper G plus 0.5. Load the first group (g Subscript j Baseline comma j equals 1) with the observation that has the smallest scored value and with frequency f 1, and let the next-smallest observation have a frequency of f. The logistic action performs the following steps for each observation to create the groups:

  1. If j equals upper G, then add this observation to group g Subscript j.

  2. Otherwise, if f Subscript j Baseline less than upper T and f Subscript j Baseline plus left floor f divided by 2 right floor less than or equals upper T, then add this observation to group g Subscript j.

  3. Otherwise, start loading the next group (g Subscript j plus 1) with f Subscript j plus 1 Baseline equals f, and set j equals j plus 1.

If the final group g Subscript j has frequency f Subscript j Baseline less than StartFraction upper F Over 2 upper G EndFraction, then add these observations to the preceding group. The total number of groups that are actually created, g, can be less than G. There must be at least three groups in order for the Hosmer-Lemeshow statistic to be computed.

For binary response variables, the Hosmer-Lemeshow goodness-of-fit statistic is obtained by calculating the Pearson chi-square statistic from the 2 times g table of observed and expected frequencies, where g is the number of groups. The statistic is written

chi Subscript normal upper H normal upper L Superscript 2 Baseline equals sigma summation Underscript j equals 1 Overscript g Endscripts StartFraction left parenthesis upper O Subscript j Baseline minus upper F Subscript j Baseline pi overbar Subscript j Baseline right parenthesis squared Over upper F Subscript j Baseline pi overbar Subscript j Baseline left parenthesis 1 minus pi overbar Subscript j Baseline right parenthesis EndFraction

where upper F Subscript j is the total frequency of subjects in the jth group, upper O Subscript j is the total frequency of event outcomes in the jth group, and pi overbar Subscript j is the average estimated predicted probability of an event outcome for the jth group. The Hosmer-Lemeshow statistic is then compared to a chi-square distribution with left parenthesis g minus r right parenthesis degrees of freedom, where the value of r can be specified in the dfReduce subparameter of the lackfit parameter. The default is r = 2.

For polytomous response variables, the Pearson chi-square statistic is computed from a 2 upper K times g table of observed and expected frequencies,

chi Subscript normal upper H normal upper L Superscript 2 Baseline equals sigma summation Underscript j equals 1 Overscript g Endscripts sigma summation Underscript k equals 1 Overscript upper K Endscripts StartFraction left parenthesis upper O Subscript j k Baseline minus upper E Subscript j k Baseline right parenthesis squared Over upper E Subscript j k Baseline EndFraction

where upper O Subscript j k is the sum of the observed frequencies and upper E Subscript j k is the sum of the model-predicted probabilities of the observations in group j with response k. The Hosmer-Lemeshow statistic is then compared to a chi-square distribution. The number of degrees of freedom for this test of cumulative and adjacent-category logit models with the equal-slopes assumption is given by Fagerland and Hosmer (2013) and Fagerland and Hosmer (2016) as (gr)(K–1)+(K–2). The number of degrees of freedom for this test of the generalized logit model is given by Fagerland, Hosmer, and Bofin (2008) as (gr)(K–1), where K is the number of response levels. The degrees of freedom can also be specified using the df subparameter of the lackfit parameter.

Large values of chi Subscript normal upper H normal upper L Superscript 2 (and small p-values) indicate a lack of fit of the model.

Last updated: March 05, 2026