HPSEVERITY Procedure

Parameter Estimation Method

If you do not specify a custom objective function by specifying programming statements and the OBJECTIVE= option in the PROC HPSEVERITY statement, then PROC HPSEVERITY uses the maximum likelihood (ML) method to estimate the parameters of each model. A nonlinear optimization process is used to maximize the log of the likelihood function. If you specify a custom objective function, then PROC HPSEVERITY uses a nonlinear optimization algorithm to estimate the parameters of each model that minimize the value of your specified objective function. For more information, see the section Custom Objective Functions.

Likelihood Function

Let f Subscript normal upper Theta Baseline left-parenthesis x right-parenthesis and upper F Subscript normal upper Theta Baseline left-parenthesis x right-parenthesis denote the PDF and CDF, respectively, evaluated at x for a set of parameter values normal upper Theta. Let Y denote the random response variable, and let y denote its value recorded in an observation in the input data set. Let upper T Superscript l and upper T Superscript r denote the random variables for the left-truncation and right-truncation threshold, respectively, and let t Superscript l and t Superscript r denote their values for an observation, respectively. If there is no left-truncation, then t Superscript l Baseline equals tau Superscript l, where tau Superscript l is the smallest value in the support of the distribution; so upper F left-parenthesis t Superscript l Baseline right-parenthesis equals 0. If there is no right-truncation, then t Superscript r Baseline equals tau Subscript h, where tau Subscript h is the largest value in the support of the distribution; so upper F left-parenthesis t Superscript r Baseline right-parenthesis equals 1. Let upper C Superscript l and upper C Superscript r denote the random variables for the left-censoring and right-censoring limit, respectively, and let c Superscript l and c Superscript r denote their values for an observation, respectively. If there is no left-censoring, then c Superscript l Baseline equals tau Subscript h; so upper F left-parenthesis c Superscript l Baseline right-parenthesis equals 1. If there is no right-censoring, then c Superscript r Baseline equals tau Superscript l; so upper F left-parenthesis c Superscript r Baseline right-parenthesis equals 0.

The set of input observations can be categorized into the following four subsets within each BY group:

  • E is the set of uncensored and untruncated observations. The likelihood of an observation in E is

    l Subscript upper E Baseline equals probability left-parenthesis upper Y equals y right-parenthesis equals f Subscript normal upper Theta Baseline left-parenthesis y right-parenthesis
  • upper E Subscript t is the set of uncensored observations that are truncated. The likelihood of an observation in upper E Subscript t is

    l Subscript upper E Sub Subscript t Baseline equals probability left-parenthesis upper Y equals y vertical-bar t Superscript l Baseline less-than upper Y less-than-or-equal-to t Superscript r Baseline right-parenthesis equals StartFraction f Subscript normal upper Theta Baseline left-parenthesis y right-parenthesis Over upper F Subscript normal upper Theta Baseline left-parenthesis t Superscript r Baseline right-parenthesis minus upper F Subscript normal upper Theta Baseline left-parenthesis t Superscript l Baseline right-parenthesis EndFraction
  • C is the set of censored observations that are not truncated. The likelihood of an observation C is

    l Subscript upper C Baseline equals probability left-parenthesis c Superscript r Baseline less-than upper Y less-than-or-equal-to c Superscript l Baseline right-parenthesis equals upper F Subscript normal upper Theta Baseline left-parenthesis c Superscript l Baseline right-parenthesis minus upper F Subscript normal upper Theta Baseline left-parenthesis c Superscript r Baseline right-parenthesis
  • upper C Subscript t is the set of censored observations that are truncated. The likelihood of an observation upper C Subscript t is

    l Subscript upper C Sub Subscript t Baseline equals probability left-parenthesis c Superscript r Baseline less-than upper Y less-than-or-equal-to c Superscript l Baseline vertical-bar t Superscript l Baseline less-than upper Y less-than-or-equal-to t Superscript r Baseline right-parenthesis equals StartFraction upper F Subscript normal upper Theta Baseline left-parenthesis c Superscript l Baseline right-parenthesis minus upper F Subscript normal upper Theta Baseline left-parenthesis c Superscript r Baseline right-parenthesis Over upper F Subscript normal upper Theta Baseline left-parenthesis t Superscript r Baseline right-parenthesis minus upper F Subscript normal upper Theta Baseline left-parenthesis t Superscript l Baseline right-parenthesis EndFraction

Note that left-parenthesis upper E union upper E Subscript t Baseline right-parenthesis intersection left-parenthesis upper C union upper C Subscript t Baseline right-parenthesis equals normal empty-set. Also, the sets upper E Subscript t and upper C Subscript t are empty when you do not specify truncation, and the sets C and upper C Subscript t are empty when you do not specify censoring.

Given this, the likelihood of the data L is as follows:

upper L equals product Underscript upper E Endscripts f Subscript normal upper Theta Baseline left-parenthesis y right-parenthesis product Underscript upper E Subscript t Baseline Endscripts StartFraction f Subscript normal upper Theta Baseline left-parenthesis y right-parenthesis Over upper F Subscript normal upper Theta Baseline left-parenthesis t Superscript r Baseline right-parenthesis minus upper F Subscript normal upper Theta Baseline left-parenthesis t Superscript l Baseline right-parenthesis EndFraction product Underscript upper C Endscripts upper F Subscript normal upper Theta Baseline left-parenthesis c Superscript l Baseline right-parenthesis minus upper F Subscript normal upper Theta Baseline left-parenthesis c Superscript r Baseline right-parenthesis product Underscript upper C Subscript t Baseline Endscripts StartFraction upper F Subscript normal upper Theta Baseline left-parenthesis c Superscript l Baseline right-parenthesis minus upper F Subscript normal upper Theta Baseline left-parenthesis c Superscript r Baseline right-parenthesis Over upper F Subscript normal upper Theta Baseline left-parenthesis t Superscript r Baseline right-parenthesis minus upper F Subscript normal upper Theta Baseline left-parenthesis t Superscript l Baseline right-parenthesis EndFraction

The maximum likelihood procedure used by PROC HPSEVERITY finds an optimal set of parameter values ModifyingAbove normal upper Theta With caret that maximizes log left-parenthesis upper L right-parenthesis subject to the boundary constraints on parameter values. For a distribution dist, you can specify such boundary constraints by using the dist_LOWERBOUNDS and dist_UPPERBOUNDS subroutines. For more information, see the section Defining a Severity Distribution Model with the FCMP Procedure. Some aspects of the optimization process can be controlled by using the NLOPTIONS statement.

Probability of Observability and Likelihood

If you specify the probability of observability for the left-truncation, then PROC HPSEVERITY uses a modified likelihood function for each truncated observation. If the probability of observability is p element-of left-parenthesis 0.0 comma 1.0 right-bracket, then for each left-truncated observation with truncation threshold t Superscript l, there exist left-parenthesis 1 minus p right-parenthesis slash p observations with a response variable value less than or equal to t Superscript l. Each such observation has a probability of probability left-parenthesis upper Y less-than-or-equal-to t Superscript l Baseline right-parenthesis equals upper F Subscript normal upper Theta Baseline left-parenthesis t Superscript l Baseline right-parenthesis. The right-truncation and censoring information does not apply to these added observations. Thus, following the notation of the section Likelihood Function, the likelihood of the data is as follows:

StartLayout 1st Row 1st Column upper L equals 2nd Column product Underscript upper E Endscripts f Subscript normal upper Theta Baseline left-parenthesis y right-parenthesis product Underscript upper E Subscript t Baseline comma t Superscript l Baseline equals tau Superscript l Baseline Endscripts StartFraction f Subscript normal upper Theta Baseline left-parenthesis y right-parenthesis Over upper F Subscript normal upper Theta Baseline left-parenthesis t Superscript r Baseline right-parenthesis EndFraction product Underscript upper E Subscript t Baseline comma t Superscript l Baseline greater-than tau Superscript l Baseline Endscripts StartFraction f Subscript normal upper Theta Baseline left-parenthesis y right-parenthesis Over upper F Subscript normal upper Theta Baseline left-parenthesis t Superscript r Baseline right-parenthesis EndFraction upper F Subscript normal upper Theta Baseline left-parenthesis t Superscript l Baseline right-parenthesis Superscript StartFraction 1 minus p Over p EndFraction 2nd Row 1st Column Blank 2nd Column product Underscript upper C Endscripts upper F Subscript normal upper Theta Baseline left-parenthesis c Superscript l Baseline right-parenthesis minus upper F Subscript normal upper Theta Baseline left-parenthesis c Superscript r Baseline right-parenthesis product Underscript upper C Subscript t Baseline comma t Superscript l Baseline equals tau Superscript l Baseline Endscripts StartFraction upper F Subscript normal upper Theta Baseline left-parenthesis c Superscript l Baseline right-parenthesis minus upper F Subscript normal upper Theta Baseline left-parenthesis c Superscript r Baseline right-parenthesis Over upper F Subscript normal upper Theta Baseline left-parenthesis t Superscript r Baseline right-parenthesis EndFraction product Underscript upper C Subscript t Baseline comma t Superscript l Baseline greater-than tau Superscript l Baseline Endscripts StartFraction upper F Subscript normal upper Theta Baseline left-parenthesis c Superscript l Baseline right-parenthesis minus upper F Subscript normal upper Theta Baseline left-parenthesis c Superscript r Baseline right-parenthesis Over upper F Subscript normal upper Theta Baseline left-parenthesis t Superscript r Baseline right-parenthesis EndFraction upper F Subscript normal upper Theta Baseline left-parenthesis t Superscript l Baseline right-parenthesis Superscript StartFraction 1 minus p Over p EndFraction EndLayout

Note that the likelihood of the observations that are not left-truncated (observations in sets E and C, and observations in sets upper E Subscript t and upper C Subscript t for which t Superscript l Baseline equals tau Superscript l) is not affected.

If you specify a custom objective function, then PROC HPSEVERITY accounts for the probability of observability only while computing the empirical distribution function estimate. The parameter estimates are affected only by your custom objective function.

Estimating Covariance and Standard Errors

PROC HPSEVERITY computes an estimate of the covariance matrix of the parameters by using the asymptotic theory of the maximum likelihood estimators (MLE). If N denotes the number of observations used for estimating a parameter vector theta theta, then the theory states that as upper N right-arrow normal infinity, the distribution of ModifyingAbove theta theta With caret, the estimate of theta theta, converges to a normal distribution with mean theta theta and covariance ModifyingAbove bold upper C With caret such that bold upper I left-parenthesis theta theta right-parenthesis dot ModifyingAbove bold upper C With caret right-arrow 1, where bold upper I left-parenthesis theta theta right-parenthesis equals minus upper E left-bracket nabla squared log left-parenthesis upper L left-parenthesis theta theta right-parenthesis right-parenthesis right-bracket is the information matrix for the likelihood of the data, upper L left-parenthesis theta theta right-parenthesis. The covariance estimate is obtained by using the inverse of the information matrix.

In particular, if bold upper G equals nabla squared left-parenthesis minus log left-parenthesis upper L left-parenthesis theta theta right-parenthesis right-parenthesis right-parenthesis denotes the Hessian matrix of the negative of log likelihood, then the covariance estimate is computed as

ModifyingAbove bold upper C With caret equals StartFraction upper N Over d EndFraction bold upper G Superscript negative 1

where d is a denominator that is determined by the VARDEF= option. If VARDEF=N, then d equals upper N, which yields the asymptotic covariance estimate. If VARDEF=DF, then d equals upper N minus k, where k is number of parameters (the model’s degrees of freedom). The VARDEF=DF option is the default, because it attempts to correct the potential bias introduced by the finite sample.

The standard error s Subscript i of the parameter theta Subscript i is computed as the square root of the ith diagonal element of the estimated covariance matrix; that is, s Subscript i Baseline equals StartRoot ModifyingAbove upper C With caret Subscript i i Baseline EndRoot.

If you specify a custom objective function, then the covariance matrix of the parameters is still computed by inverting the information matrix, except that the Hessian matrix bold upper G is computed as bold upper G equals nabla squared log left-parenthesis upper U left-parenthesis theta theta right-parenthesis right-parenthesis, where U denotes your custom objective function that is minimized by the optimizer.

Covariance and standard error estimates might not be available if the Hessian matrix is found to be singular at the end of the optimization process. This can especially happen if the optimization process stops without converging.

Last updated: June 19, 2025