SEVERITY Procedure

Statistics of Fit

PROC SEVERITY computes and reports various statistics of fit to indicate how well the estimated model fits the data. The statistics belong to two categories: likelihood-based statistics and EDF-based statistics. Neg2LogLike, AIC, AICC, and BIC are likelihood-based statistics, and KS, AD, and CvM are EDF-based statistics. The following subsections provide definitions of each.

Likelihood-Based Statistics of Fit

Let y Subscript i Baseline comma i equals 1 comma ellipsis comma upper N, denote the response variable values. Let L be the likelihood as defined in the section Likelihood Function. Let p denote the number of model parameters that are estimated. Note that p equals p Subscript d Baseline plus left-parenthesis k minus k Subscript r Baseline right-parenthesis, where p Subscript d is the number of distribution parameters, k is the number of all regression parameters, and k Subscript r is the number of regression parameters that are found to be linearly dependent (redundant) on other regression parameters. Given this notation, the likelihood-based statistics are defined as follows:

Neg2LogLike

The log likelihood is reported as

Neg 2 LogLike equals minus 2 log left-parenthesis upper L right-parenthesis

The multiplying factor negative 2 makes it easy to compare it to the other likelihood-based statistics. A model that has a smaller value of Neg2LogLike is deemed better.

AIC

Akaike’s information criterion (AIC) is defined as

AIC equals minus 2 log left-parenthesis upper L right-parenthesis plus 2 p

A model that has a smaller AIC value is deemed better.

AICC

The corrected Akaike’s information criterion (AICC) is defined as

AICC equals minus 2 log left-parenthesis upper L right-parenthesis plus StartFraction 2 upper N p Over upper N minus p minus 1 EndFraction

A model that has a smaller AICC value is deemed better. It corrects the finite-sample bias that AIC has when N is small compared to p. AICC is related to AIC as

AICC equals AIC plus StartFraction 2 p left-parenthesis p plus 1 right-parenthesis Over upper N minus p minus 1 EndFraction

As N becomes large compared to p, AICC converges to AIC. AICC is usually recommended over AIC as a model selection criterion.

BIC

The Schwarz Bayesian information criterion (BIC) is defined as

BIC equals minus 2 log left-parenthesis upper L right-parenthesis plus p log left-parenthesis upper N right-parenthesis

A model that has a smaller BIC value is deemed better.

EDF-Based Statistics

This class of statistics is based on the difference between the estimate of the cumulative distribution function (CDF) and the estimate of the empirical distribution function (EDF). A model that has a smaller value of the chosen EDF-based statistic is deemed better.

Let y Subscript i Baseline comma i equals 1 comma ellipsis comma upper N comma denote the sample of N values of the response variable. Let w Subscript i denote the normalized weight of the ith observation. If w Subscript i Superscript o denotes the original, unnormalized weight of the ith observation, then w Subscript i Baseline equals upper N w Subscript i Superscript o Baseline slash left-parenthesis sigma-summation Underscript i equals 1 Overscript upper N Endscripts w Subscript i Superscript o Baseline right-parenthesis. Let upper N Subscript u denote the number of observations with unique (nonduplicate) values of the response variable. Let upper W Subscript i Baseline equals sigma-summation Underscript j equals 1 Overscript upper N Endscripts w Subscript j Baseline upper I left-bracket y Subscript j Baseline equals y Subscript i Baseline right-bracket denote the total weight of observations with a value y Subscript i, where I is an indicator function. Let r Subscript i Baseline equals sigma-summation Underscript j equals 1 Overscript upper N Endscripts w Subscript j Baseline upper I left-bracket y Subscript j Baseline less-than-or-equal-to y Subscript i Baseline right-bracket denote the total weight of observations with a value less than or equal to y Subscript i. Let upper W equals sigma-summation Underscript i equals 1 Overscript upper N Subscript u Baseline Endscripts upper W Subscript i denote the total weight of all observations. Use of normalized weights implies that upper W equals upper N.

Let upper F Subscript n Baseline left-parenthesis y Subscript i Baseline right-parenthesis denote the EDF estimate that is computed by using the method that you specify in the EMPIRICALCDF= option. Let upper Z Subscript i Baseline equals ModifyingAbove upper F With caret left-parenthesis y Subscript i Baseline right-parenthesis denote the estimate of the CDF. Let upper F Subscript n Baseline left-parenthesis upper Z Subscript i Baseline right-parenthesis denote the EDF estimate of upper Z Subscript i values that are computed using the same method that is used to compute the EDF of y Subscript i values. Using the probability integral transformation, if upper F left-parenthesis y right-parenthesis is the true distribution of the random variable Y, then the random variable upper Z equals upper F left-parenthesis y right-parenthesis is uniformly distributed between 0 and 1 (D’Agostino and Stephens 1986, Ch. 4). Thus, comparing upper F Subscript n Baseline left-parenthesis y Subscript i Baseline right-parenthesis with ModifyingAbove upper F With caret left-parenthesis y Subscript i Baseline right-parenthesis is equivalent to comparing upper F Subscript n Baseline left-parenthesis upper Z Subscript i Baseline right-parenthesis with ModifyingAbove upper F With caret left-parenthesis upper Z Subscript i Baseline right-parenthesis equals upper Z Subscript i (uniform distribution).

Note the following two points regarding which CDF estimates are used for computing the test statistics:

  • If you specify regression effects, then the CDF estimates upper Z Subscript i that are used for computing the EDF test statistics are from a mixture distribution. For more information, see the section CDF and PDF Estimates with Regression Effects.

  • If the EDF estimates are conditional because of the truncation information, then each unconditional estimate upper Z Subscript i is converted to a conditional estimate using the method described in the section Truncation and Conditional CDF Estimates.

In the following, it is assumed that upper Z Subscript i denotes an appropriate estimate of the CDF if you specify any truncation or regression effects. Given this, the EDF-based statistics of fit are defined as follows:

KS

The Kolmogorov-Smirnov (KS) statistic computes the largest vertical distance between the CDF and the EDF. It is formally defined as follows:

KS equals sup Underscript y Endscripts StartAbsoluteValue upper F Subscript n Baseline left-parenthesis y right-parenthesis minus upper F left-parenthesis y right-parenthesis EndAbsoluteValue

If the STANDARD method is used to compute the EDF, then the following formula is used:

StartLayout 1st Row 1st Column upper D Superscript plus 2nd Column equals max Subscript i Baseline left-parenthesis StartFraction r Subscript i Baseline Over upper W EndFraction minus upper Z Subscript i Baseline right-parenthesis 2nd Row 1st Column upper D Superscript minus 2nd Column equals max Subscript i Baseline left-parenthesis upper Z Subscript i Baseline minus StartFraction r Subscript i minus 1 Baseline Over upper W EndFraction right-parenthesis 3rd Row 1st Column KS 2nd Column equals StartRoot upper W EndRoot max left-parenthesis upper D Superscript plus Baseline comma upper D Superscript minus Baseline right-parenthesis plus StartFraction 0.19 Over StartRoot upper W EndRoot EndFraction EndLayout

Note that r 0 is assumed to be 0.

If the method used to compute the EDF is any method other than the STANDARD method, then the following formula is used:

StartLayout 1st Row 1st Column upper D Superscript plus 2nd Column equals max Subscript i Baseline left-parenthesis upper F Subscript n Baseline left-parenthesis upper Z Subscript i Baseline right-parenthesis minus upper Z Subscript i Baseline right-parenthesis comma if upper F Subscript n Baseline left-parenthesis upper Z Subscript i Baseline right-parenthesis greater-than-or-equal-to upper Z Subscript i Baseline 2nd Row 1st Column upper D Superscript minus 2nd Column equals max Subscript i Baseline left-parenthesis upper Z Subscript i Baseline minus upper F Subscript n Baseline left-parenthesis upper Z Subscript i Baseline right-parenthesis right-parenthesis comma if upper F Subscript n Baseline left-parenthesis upper Z Subscript i Baseline right-parenthesis less-than upper Z Subscript i Baseline 3rd Row 1st Column KS 2nd Column equals StartRoot upper W EndRoot max left-parenthesis upper D Superscript plus Baseline comma upper D Superscript minus Baseline right-parenthesis plus StartFraction 0.19 Over StartRoot upper W EndRoot EndFraction EndLayout
AD

The Anderson-Darling (AD) statistic is a quadratic EDF statistic that is proportional to the expected value of the weighted squared difference between the EDF and CDF. It is formally defined as follows:

AD equals upper N integral Subscript negative normal infinity Superscript normal infinity Baseline StartFraction left-parenthesis upper F Subscript n Baseline left-parenthesis y right-parenthesis minus upper F left-parenthesis y right-parenthesis right-parenthesis squared Over upper F left-parenthesis y right-parenthesis left-parenthesis 1 minus upper F left-parenthesis y right-parenthesis right-parenthesis EndFraction d upper F left-parenthesis y right-parenthesis

If the STANDARD method is used to compute the EDF, then PROC SEVERITY uses the following formula:

AD equals negative upper W minus StartFraction 1 Over upper W EndFraction sigma-summation Underscript i equals 1 Overscript upper N Subscript u Baseline Endscripts upper W Subscript i Baseline left-bracket left-parenthesis 2 r Subscript i Baseline minus 1 right-parenthesis log left-parenthesis upper Z Subscript i Baseline right-parenthesis plus left-parenthesis 2 upper W plus 1 minus 2 r Subscript i Baseline right-parenthesis log left-parenthesis 1 minus upper Z Subscript i Baseline right-parenthesis right-bracket

If the method used to compute the EDF is any method other than the STANDARD method, then the statistic can be computed by using the following two pieces of information:

  • If the EDF estimates are computed using the KAPLANMEIER or MODIFIEDKM methods, then EDF is a step function such that the estimate upper F Subscript n Baseline left-parenthesis z right-parenthesis is a constant equal to upper F Subscript n Baseline left-parenthesis upper Z Subscript i minus 1 Baseline right-parenthesis in interval left-bracket upper Z Subscript i minus 1 Baseline comma upper Z Subscript i Baseline right-bracket. If the EDF estimates are computed using the TURNBULL method, then there are two types of intervals: one in which the EDF curve is constant and the other in which the EDF curve is theoretically undefined. For computational purposes, it is assumed that the EDF curve is linear for the latter type of the interval. For each method, the EDF estimate upper F Subscript n Baseline left-parenthesis y right-parenthesis at y can be written as

    upper F Subscript n Baseline left-parenthesis z right-parenthesis equals upper F Subscript n Baseline left-parenthesis upper Z Subscript i minus 1 Baseline right-parenthesis plus upper S Subscript i Baseline left-parenthesis z minus upper Z Subscript i minus 1 Baseline right-parenthesis comma for z element-of left-bracket upper Z Subscript i minus 1 Baseline comma upper Z Subscript i Baseline right-bracket

    where upper S Subscript i is the slope of the line defined as

    upper S Subscript i Baseline equals StartFraction upper F Subscript n Baseline left-parenthesis upper Z Subscript i Baseline right-parenthesis minus upper F Subscript n Baseline left-parenthesis upper Z Subscript i minus 1 Baseline right-parenthesis Over upper Z Subscript i Baseline minus upper Z Subscript i minus 1 Baseline EndFraction

    For the KAPLANMEIER or MODIFIEDKM method, upper S Subscript i Baseline equals 0 in each interval.

  • Using the probability integral transform z equals upper F left-parenthesis y right-parenthesis, the formula simplifies to

    AD equals upper N integral Subscript negative normal infinity Superscript normal infinity Baseline StartFraction left-parenthesis upper F Subscript n Baseline left-parenthesis z right-parenthesis minus z right-parenthesis squared Over z left-parenthesis 1 minus z right-parenthesis EndFraction d z

The computation formula can then be derived from the approximation,

StartLayout 1st Row 1st Column AD 2nd Column equals upper N sigma-summation Underscript i equals 1 Overscript upper K plus 1 Endscripts integral Subscript upper Z Subscript i minus 1 Baseline Superscript upper Z Subscript i Baseline Baseline StartFraction left-parenthesis upper F Subscript n Baseline left-parenthesis z right-parenthesis minus z right-parenthesis squared Over z left-parenthesis 1 minus z right-parenthesis EndFraction d z 2nd Row 1st Column Blank 2nd Column equals upper N sigma-summation Underscript i equals 1 Overscript upper K plus 1 Endscripts integral Subscript upper Z Subscript i minus 1 Baseline Superscript upper Z Subscript i Baseline Baseline StartFraction left-parenthesis upper F Subscript n Baseline left-parenthesis upper Z Subscript i minus 1 Baseline right-parenthesis plus upper S Subscript i Baseline left-parenthesis z minus upper Z Subscript i minus 1 Baseline right-parenthesis minus z right-parenthesis squared Over z left-parenthesis 1 minus z right-parenthesis EndFraction d z 3rd Row 1st Column Blank 2nd Column equals upper N sigma-summation Underscript i equals 1 Overscript upper K plus 1 Endscripts integral Subscript upper Z Subscript i minus 1 Baseline Superscript upper Z Subscript i Baseline Baseline StartFraction left-parenthesis upper P Subscript i Baseline minus upper Q Subscript i Baseline z right-parenthesis squared Over z left-parenthesis 1 minus z right-parenthesis EndFraction d z EndLayout

where upper P Subscript i Baseline equals upper F Subscript n Baseline left-parenthesis upper Z Subscript i minus 1 Baseline right-parenthesis minus upper S Subscript i Baseline upper Z Subscript i minus 1, upper Q Subscript i Baseline equals 1 minus upper S Subscript i, and K is the number of points at which the EDF estimate are computed. For the TURNBULL method, upper K equals 2 k for some k.

Assuming upper Z 0 equals 0, upper Z Subscript upper K plus 1 Baseline equals 1, upper F Subscript n Baseline left-parenthesis 0 right-parenthesis equals 0, and upper F Subscript n Baseline left-parenthesis upper Z Subscript upper K Baseline right-parenthesis equals 1 yields the computation formula,

StartLayout 1st Row 1st Column AD equals 2nd Column minus upper N left-parenthesis upper Z 1 plus log left-parenthesis 1 minus upper Z 1 right-parenthesis plus log left-parenthesis upper Z Subscript upper K Baseline right-parenthesis plus left-parenthesis 1 minus upper Z Subscript upper K Baseline right-parenthesis right-parenthesis 2nd Row 1st Column Blank 2nd Column plus upper N sigma-summation Underscript i equals 2 Overscript upper K Endscripts left-bracket upper P Subscript i Superscript 2 Baseline upper A Subscript i Baseline minus left-parenthesis upper Q Subscript i Baseline minus upper P Subscript i Baseline right-parenthesis squared upper B Subscript i Baseline minus upper Q Subscript i Superscript 2 Baseline upper C Subscript i Baseline right-bracket EndLayout

where upper A Subscript i Baseline equals log left-parenthesis upper Z Subscript i Baseline right-parenthesis minus log left-parenthesis upper Z Subscript i minus 1 Baseline right-parenthesis, upper B Subscript i Baseline equals log left-parenthesis 1 minus upper Z Subscript i Baseline right-parenthesis minus log left-parenthesis 1 minus upper Z Subscript i minus 1 Baseline right-parenthesis, and upper C Subscript i Baseline equals upper Z Subscript i Baseline minus upper Z Subscript i minus 1.

If EDF estimates are computed using the KAPLANMEIER or MODIFIEDKM method, then upper P Subscript i Baseline equals upper F Subscript n Baseline left-parenthesis upper Z Subscript i minus 1 Baseline right-parenthesis and upper Q Subscript i Baseline equals 1, which simplifies the formula as

StartLayout 1st Row 1st Column AD equals 2nd Column minus upper N left-parenthesis 1 plus log left-parenthesis 1 minus upper Z 1 right-parenthesis plus log left-parenthesis upper Z Subscript upper K Baseline right-parenthesis right-parenthesis 2nd Row 1st Column Blank 2nd Column plus upper N sigma-summation Underscript i equals 2 Overscript upper K Endscripts left-bracket upper F Subscript n Baseline left-parenthesis upper Z Subscript i minus 1 Baseline right-parenthesis squared upper A Subscript i Baseline minus left-parenthesis 1 minus upper F Subscript n Baseline left-parenthesis upper Z Subscript i minus 1 Baseline right-parenthesis right-parenthesis squared upper B Subscript i Baseline right-bracket EndLayout
CvM

The Cramér–von Mises (CvM) statistic is a quadratic EDF statistic that is proportional to the expected value of the squared difference between the EDF and CDF. It is formally defined as follows:

CvM equals upper N integral Subscript negative normal infinity Superscript normal infinity Baseline left-parenthesis upper F Subscript n Baseline left-parenthesis y right-parenthesis minus upper F left-parenthesis y right-parenthesis right-parenthesis squared d upper F left-parenthesis y right-parenthesis

If the STANDARD method is used to compute the EDF, then the following formula is used:

CvM equals StartFraction 1 Over 12 upper W EndFraction plus sigma-summation Underscript i equals 1 Overscript upper N Subscript u Baseline Endscripts upper W Subscript i Baseline left-parenthesis upper Z Subscript i Baseline minus StartFraction left-parenthesis 2 r Subscript i Baseline minus 1 right-parenthesis Over 2 upper W EndFraction right-parenthesis squared

If the method used to compute the EDF is any method other than the STANDARD method, then the statistic can be computed by using the following two pieces of information:

  • As described previously for the AD statistic, the EDF estimates are assumed to be piecewise linear such that the estimate upper F Subscript n Baseline left-parenthesis y right-parenthesis at y is

    upper F Subscript n Baseline left-parenthesis z right-parenthesis equals upper F Subscript n Baseline left-parenthesis upper Z Subscript i minus 1 Baseline right-parenthesis plus upper S Subscript i Baseline left-parenthesis z minus upper Z Subscript i minus 1 Baseline right-parenthesis comma for z element-of left-bracket upper Z Subscript i minus 1 Baseline comma upper Z Subscript i Baseline right-bracket

    where upper S Subscript i is the slope of the line defined as

    upper S Subscript i Baseline equals StartFraction upper F Subscript n Baseline left-parenthesis upper Z Subscript i Baseline right-parenthesis minus upper F Subscript n Baseline left-parenthesis upper Z Subscript i minus 1 Baseline right-parenthesis Over upper Z Subscript i Baseline minus upper Z Subscript i minus 1 Baseline EndFraction

    For the KAPLANMEIER or MODIFIEDKM method, upper S Subscript i Baseline equals 0 in each interval.

  • Using the probability integral transform z equals upper F left-parenthesis y right-parenthesis, the formula simplifies to

    CvM equals upper N integral Subscript negative normal infinity Superscript normal infinity Baseline left-parenthesis upper F Subscript n Baseline left-parenthesis z right-parenthesis minus z right-parenthesis squared d z

The computation formula can then be derived from the following approximation,

StartLayout 1st Row 1st Column CvM 2nd Column equals upper N sigma-summation Underscript i equals 1 Overscript upper K plus 1 Endscripts integral Subscript upper Z Subscript i minus 1 Baseline Superscript upper Z Subscript i Baseline Baseline left-parenthesis upper F Subscript n Baseline left-parenthesis z right-parenthesis minus z right-parenthesis squared d z 2nd Row 1st Column Blank 2nd Column equals upper N sigma-summation Underscript i equals 1 Overscript upper K plus 1 Endscripts integral Subscript upper Z Subscript i minus 1 Baseline Superscript upper Z Subscript i Baseline Baseline left-parenthesis upper F Subscript n Baseline left-parenthesis upper Z Subscript i minus 1 Baseline right-parenthesis plus upper S Subscript i Baseline left-parenthesis z minus upper Z Subscript i minus 1 Baseline right-parenthesis minus z right-parenthesis squared d z 3rd Row 1st Column Blank 2nd Column equals upper N sigma-summation Underscript i equals 1 Overscript upper K plus 1 Endscripts integral Subscript upper Z Subscript i minus 1 Baseline Superscript upper Z Subscript i Baseline Baseline left-parenthesis upper P Subscript i Baseline minus upper Q Subscript i Baseline z right-parenthesis squared d z EndLayout

where upper P Subscript i Baseline equals upper F Subscript n Baseline left-parenthesis upper Z Subscript i minus 1 Baseline right-parenthesis minus upper S Subscript i Baseline upper Z Subscript i minus 1, upper Q Subscript i Baseline equals 1 minus upper S Subscript i, and K is the number of points at which the EDF estimate are computed. For the TURNBULL method, upper K equals 2 k for some k.

Assuming upper Z 0 equals 0, upper Z Subscript upper K plus 1 Baseline equals 1, and upper F Subscript n Baseline left-parenthesis 0 right-parenthesis equals 0 yields the following computation formula,

CvM equals upper N StartFraction upper Z 1 cubed Over 3 EndFraction plus upper N sigma-summation Underscript i equals 2 Overscript upper K plus 1 Endscripts left-bracket upper P Subscript i Superscript 2 Baseline upper A Subscript i Baseline minus upper P Subscript i Baseline upper Q Subscript i Baseline upper B Subscript i Baseline minus StartFraction upper Q Subscript i Superscript 2 Baseline Over 3 EndFraction upper C Subscript i Baseline right-bracket

where upper A Subscript i Baseline equals upper Z Subscript i Baseline minus upper Z Subscript i minus 1, upper B Subscript i Baseline equals upper Z Subscript i Superscript 2 Baseline minus upper Z Subscript i minus 1 Superscript 2, and upper C Subscript i Baseline equals upper Z Subscript i Superscript 3 Baseline minus upper Z Subscript i minus 1 Superscript 3.

If EDF estimates are computed using the KAPLANMEIER or MODIFIEDKM method, then upper P Subscript i Baseline equals upper F Subscript n Baseline left-parenthesis upper Z Subscript i minus 1 Baseline right-parenthesis and upper Q Subscript i Baseline equals 1, which simplifies the formula as

CvM equals StartFraction upper N Over 3 EndFraction plus upper N sigma-summation Underscript i equals 2 Overscript upper K plus 1 Endscripts left-bracket upper F Subscript n Baseline left-parenthesis upper Z Subscript i minus 1 Baseline right-parenthesis squared left-parenthesis upper Z Subscript i Baseline minus upper Z Subscript i minus 1 Baseline right-parenthesis minus upper F Subscript n Baseline left-parenthesis upper Z Subscript i minus 1 Baseline right-parenthesis left-parenthesis upper Z Subscript i Superscript 2 Baseline minus upper Z Subscript i minus 1 Superscript 2 Baseline right-parenthesis right-bracket

which is similar to the formula proposed by Koziol and Green (1976).

Last updated: June 19, 2025