SEVERITY Procedure

Empirical Distribution Function Estimation Methods

The empirical distribution function (EDF) is a nonparametric estimate of the cumulative distribution function (CDF) of the distribution. PROC SEVERITY computes EDF estimates for two purposes: to send the estimates to a distribution’s PARMINIT subroutine in order to initialize the distribution parameters, and to compute the EDF-based statistics of fit.

To reduce the time that it takes to compute the EDF estimates, you can use the INITSAMPLE option to specify that only a fraction of the input data be used. If you do not specify the INITSAMPLE option, then PROC SEVERITY computes the EDF estimates by using all valid observations in the DATA= data set, or by using all valid observations in the current BY group if you specify a BY statement.

This section describes the methods that are used for computing EDF estimates.

Notation

Let there be a set of N observations, each containing a quintuplet of values left-parenthesis y Subscript i Baseline comma t Subscript i Superscript l Baseline comma t Subscript i Superscript r Baseline comma c Subscript i Superscript r Baseline comma c Subscript i Superscript l Baseline right-parenthesis comma i equals 1 comma ellipsis comma upper N, where y Subscript i is the value of the response variable, t Subscript i Superscript l is the value of the left-truncation threshold, t Subscript i Superscript r is the value of the right-truncation threshold, c Subscript i Superscript r is the value of the right-censoring limit, and c Subscript i Superscript l is the value of the left-censoring limit.

If an observation is not left-truncated, then t Subscript i Superscript l Baseline equals tau Superscript l, where tau Superscript l is the smallest value in the support of the distribution; so upper F left-parenthesis t Subscript i Superscript l Baseline right-parenthesis equals 0. If an observation is not right-truncated, then t Subscript i Superscript r Baseline equals tau Subscript h, where tau Subscript h is the largest value in the support of the distribution; so upper F left-parenthesis t Subscript i Superscript r Baseline right-parenthesis equals 1. If an observation is not right-censored, then c Subscript i Superscript r Baseline equals tau Superscript l; so upper F left-parenthesis c Subscript i Superscript r Baseline right-parenthesis equals 0. If an observation is not left-censored, then c Subscript i Superscript l Baseline equals tau Subscript h; so upper F left-parenthesis c Subscript i Superscript l Baseline right-parenthesis equals 1.

Let w Subscript i denote the weight associated with ith observation. If you specify the WEIGHT statement, then w Subscript i is the normalized value of the weight variable; otherwise, it is set to 1. The weights are normalized such that they sum up to N.

An indicator function upper I left-bracket e right-bracket takes a value of 1 or 0 if the expression e is true or false, respectively.

Estimation Methods

If the response variable is subject to both left-censoring and right-censoring effects, then PROC SEVERITY uses the Turnbull’s method. This section describes methods other than Turnbull’s method. For Turnbull’s method, see the next section Turnbull’s EDF Estimation Method.

The method descriptions assume that all observations are either uncensored or right-censored; that is, each observation is of the form left-parenthesis y Subscript i Baseline comma t Subscript i Superscript l Baseline comma t Subscript i Superscript r Baseline comma tau Superscript l Baseline comma tau Subscript h Baseline right-parenthesis or left-parenthesis y Subscript i Baseline comma t Subscript i Superscript l Baseline comma t Subscript i Superscript r Baseline comma c Subscript i Superscript r Baseline comma tau Subscript h Baseline right-parenthesis.

If all observations are either uncensored or left-censored, then each observation is of the form left-parenthesis y Subscript i Baseline comma t Subscript i Superscript l Baseline comma t Subscript i Superscript r Baseline comma tau Subscript l Baseline comma c Subscript i Superscript l Baseline right-parenthesis. It is converted to an observation left-parenthesis minus y Subscript i Baseline comma minus t Subscript i Superscript r Baseline comma minus t Subscript i Superscript l Baseline comma minus c Subscript i Superscript l Baseline comma tau Subscript h Baseline right-parenthesis; that is, the signs of all the response variable values are reversed, the new left-truncation threshold is equal to the negative of the original right-truncation threshold, the new right-truncation threshold is equal to the negative of the original left-truncation threshold, and the negative of the original left-censoring limit becomes the new right-censoring limit. With this transformation, each observation is either uncensored or right-censored. The methods described for handling uncensored or right-censored data are now applicable. After the EDF estimates are computed, the observations are transformed back to the original form and EDF estimates are adjusted such upper F Subscript n Baseline left-parenthesis y Subscript i Baseline right-parenthesis equals 1 minus upper F Subscript n Baseline left-parenthesis minus y Subscript i Baseline minus right-parenthesis, where upper F Subscript n Baseline left-parenthesis minus y Subscript i Baseline minus right-parenthesis denotes the EDF estimate of the value slightly less than the transformed value minus y Subscript i.

Further, a set of uncensored or right-censored observations can be converted to a set of observations of the form left-parenthesis y Subscript i Baseline comma t Subscript i Superscript l Baseline comma t Subscript i Superscript r Baseline comma delta Subscript i Baseline right-parenthesis, where delta Subscript i is the indicator of right-censoring. delta Subscript i Baseline equals 0 indicates a right-censored observation, in which case y Subscript i is assumed to record the right-censoring limit c Subscript i Superscript r. delta Subscript i Baseline equals 1 indicates an uncensored observation, and y Subscript i records the exact observed value. In other words, delta Subscript i Baseline equals upper I left-bracket upper Y less-than-or-equal-to upper C Superscript r Baseline right-bracket and y Subscript i Baseline equals min left-parenthesis y Subscript i Baseline comma c Subscript i Superscript r Baseline right-parenthesis.

Given this notation, the EDF is estimated as

upper F Subscript n Baseline left-parenthesis y right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column 0 2nd Column if y less-than y Superscript left-parenthesis 1 right-parenthesis Baseline 2nd Row 1st Column ModifyingAbove upper F With caret Subscript n Baseline left-parenthesis y Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis 2nd Column if y Superscript left-parenthesis k right-parenthesis Baseline less-than-or-equal-to y less-than y Superscript left-parenthesis k plus 1 right-parenthesis Baseline comma k equals 1 comma ellipsis comma upper N minus 1 3rd Row 1st Column ModifyingAbove upper F With caret Subscript n Baseline left-parenthesis y Superscript left-parenthesis upper N right-parenthesis Baseline right-parenthesis 2nd Column if y Superscript left-parenthesis upper N right-parenthesis Baseline less-than-or-equal-to y EndLayout

where y Superscript left-parenthesis k right-parenthesis denotes the kth-order statistic of the set left-brace y Subscript i Baseline right-brace and ModifyingAbove upper F With caret Subscript n Baseline left-parenthesis y Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis is the estimate computed at that value. The definition of ModifyingAbove upper F With caret Subscript n depends on the estimation method. You can specify a particular method or let PROC SEVERITY choose an appropriate method by using the EMPIRICALCDF= option in the PROC SEVERITY statement. Each method computes ModifyingAbove upper F With caret Subscript n as follows:

STANDARD

This method is the standard way of computing EDF. The EDF estimate at observation i is computed as follows:

ModifyingAbove upper F With caret Subscript n Baseline left-parenthesis y Subscript i Baseline right-parenthesis equals StartFraction 1 Over upper N EndFraction sigma-summation Underscript j equals 1 Overscript upper N Endscripts w Subscript j Baseline dot upper I left-bracket y Subscript j Baseline less-than-or-equal-to y Subscript i Baseline right-bracket

If you do not specify any censoring or truncation information, then this method is chosen. If you explicitly specify this method, then PROC SEVERITY ignores any censoring and truncation information that you specify in the LOSS statement.

The standard error of ModifyingAbove upper F With caret Subscript n Baseline left-parenthesis y Subscript i Baseline right-parenthesis is computed by using the normal approximation method:

ModifyingAbove sigma With caret Subscript n Baseline left-parenthesis y Subscript i Baseline right-parenthesis equals StartRoot ModifyingAbove upper F With caret Subscript n Baseline left-parenthesis y Subscript i Baseline right-parenthesis left-parenthesis 1 minus ModifyingAbove upper F With caret Subscript n Baseline left-parenthesis y Subscript i Baseline right-parenthesis right-parenthesis slash upper N EndRoot
KAPLANMEIER

The Kaplan-Meier (KM) estimator, also known as the product-limit estimator, was first introduced by Kaplan and Meier (1958) for censored data. Lynden-Bell (1971) derived a similar estimator for left-truncated data. PROC SEVERITY uses the definition that combines both censoring and truncation information (Klein and Moeschberger 1997; Lai and Ying 1991).

The EDF estimate at observation i is computed as

ModifyingAbove upper F With caret Subscript n Baseline left-parenthesis y Subscript i Baseline right-parenthesis equals 1 minus product Underscript tau less-than-or-equal-to y Subscript i Baseline Endscripts left-parenthesis 1 minus StartFraction n left-parenthesis tau right-parenthesis Over upper R Subscript n Baseline left-parenthesis tau right-parenthesis EndFraction right-parenthesis

where n left-parenthesis tau right-parenthesis and upper R Subscript n Baseline left-parenthesis tau right-parenthesis are defined as follows:

  • n left-parenthesis tau right-parenthesis equals sigma-summation Underscript k equals 1 Overscript upper N Endscripts w Subscript k Baseline dot upper I left-bracket y Subscript k Baseline equals tau and tau less-than-or-equal-to t Subscript k Superscript r Baseline and delta Subscript k Baseline equals 1 right-bracket, which is the number of uncensored observations (delta Subscript k Baseline equals 1) for which the response variable value is equal to tau and tau is observable according to the right-truncation threshold of that observation (tau less-than-or-equal-to t Subscript k Superscript r).

  • upper R Subscript n Baseline left-parenthesis tau right-parenthesis equals sigma-summation Underscript k equals 1 Overscript upper N Endscripts w Subscript k Baseline dot upper I left-bracket y Subscript k Baseline greater-than-or-equal-to tau greater-than t Subscript k Superscript l Baseline right-bracket, which is the size (cardinality) of the risk set at tau. The term risk set has its origins in survival analysis; it contains the events that are at risk of failure at a given time, tau. In other words, it contains the events that have survived up to time tau and might fail at or after tau. For PROC SEVERITY, time is equivalent to the magnitude of the event and failure is equivalent to an uncensored and observable event, where observable means it satisfies the truncation thresholds.

This method is chosen when you specify at least one form of censoring or truncation.

The standard error of ModifyingAbove upper F With caret Subscript n Baseline left-parenthesis y Subscript i Baseline right-parenthesis is computed by using Greenwood’s formula (Greenwood 1926):

ModifyingAbove sigma With caret Subscript n Baseline left-parenthesis y Subscript i Baseline right-parenthesis equals StartRoot left-parenthesis 1 minus ModifyingAbove upper F With caret Subscript n Baseline left-parenthesis y Subscript i Baseline right-parenthesis right-parenthesis squared dot sigma-summation Underscript tau less-than-or-equal-to y Subscript i Baseline Endscripts left-parenthesis StartFraction n left-parenthesis tau right-parenthesis Over upper R Subscript n Baseline left-parenthesis tau right-parenthesis left-parenthesis upper R Subscript n Baseline left-parenthesis tau right-parenthesis minus n left-parenthesis tau right-parenthesis right-parenthesis EndFraction right-parenthesis EndRoot
MODIFIEDKM

The product-limit estimator used by the KAPLANMEIER method does not work well if the risk set size becomes very small. For right-censored data, the size can become small towards the right tail. For left-truncated data, the size can become small at the left tail and can remain so for the entire range of data. This was demonstrated by Lai and Ying (1991). They proposed a modification to the estimator that ignores the effects due to small risk set sizes.

The EDF estimate at observation i is computed as

ModifyingAbove upper F With caret Subscript n Baseline left-parenthesis y Subscript i Baseline right-parenthesis equals 1 minus product Underscript tau less-than-or-equal-to y Subscript i Baseline Endscripts left-parenthesis 1 minus StartFraction n left-parenthesis tau right-parenthesis Over upper R Subscript n Baseline left-parenthesis tau right-parenthesis EndFraction dot upper I left-bracket upper R Subscript n Baseline left-parenthesis tau right-parenthesis greater-than-or-equal-to c upper N Superscript alpha Baseline right-bracket right-parenthesis

where the definitions of n left-parenthesis tau right-parenthesis and upper R Subscript n Baseline left-parenthesis tau right-parenthesis are identical to those used for the KAPLANMEIER method described previously.

You can specify the values of c and alpha by using the C= and ALPHA= options. If you do not specify a value for c, the default value used is c = 1. If you do not specify a value for alpha, the default value used is alpha equals 0.5.

As an alternative, you can also specify an absolute lower bound, say L, on the risk set size by using the RSLB= option, in which case upper I left-bracket upper R Subscript n Baseline left-parenthesis tau right-parenthesis greater-than-or-equal-to c upper N Superscript alpha Baseline right-bracket is replaced by upper I left-bracket upper R Subscript n Baseline left-parenthesis tau right-parenthesis greater-than-or-equal-to upper L right-bracket in the definition.

The standard error of ModifyingAbove upper F With caret Subscript n Baseline left-parenthesis y Subscript i Baseline right-parenthesis is computed by using Greenwood’s formula (Greenwood 1926):

ModifyingAbove sigma With caret Subscript n Baseline left-parenthesis y Subscript i Baseline right-parenthesis equals StartRoot left-parenthesis 1 minus ModifyingAbove upper F With caret Subscript n Baseline left-parenthesis y Subscript i Baseline right-parenthesis right-parenthesis squared dot sigma-summation Underscript tau less-than-or-equal-to y Subscript i Baseline Endscripts left-parenthesis StartFraction n left-parenthesis tau right-parenthesis Over upper R Subscript n Baseline left-parenthesis tau right-parenthesis left-parenthesis upper R Subscript n Baseline left-parenthesis tau right-parenthesis minus n left-parenthesis tau right-parenthesis right-parenthesis EndFraction dot upper I left-bracket upper R Subscript n Baseline left-parenthesis tau right-parenthesis greater-than-or-equal-to c upper N Superscript alpha Baseline right-bracket right-parenthesis EndRoot

Turnbull’s EDF Estimation Method

If the response variable is subject to both left-censoring and right-censoring effects, then the SEVERITY procedure uses a method proposed by Turnbull (1976) to compute the nonparametric estimates of the cumulative distribution function. The original Turnbull’s method is modified using the suggestions made by Frydman (1994) when truncation effects are present.

Let the input data consist of N observations in the form of quintuplets of values left-parenthesis y Subscript i Baseline comma t Subscript i Superscript l Baseline comma t Subscript i Superscript r Baseline comma c Subscript i Superscript r Baseline comma c Subscript i Superscript l Baseline right-parenthesis comma i equals 1 comma ellipsis comma upper N with notation described in the section Notation. For each observation, let upper A Subscript i Baseline equals left-parenthesis c Subscript i Superscript r Baseline comma c Subscript i Superscript l Baseline right-bracket be the censoring interval; that is, the response variable value is known to lie in the interval upper A Subscript i, but the exact value is not known. If an observation is uncensored, then upper A Subscript i Baseline equals left-parenthesis y Subscript i Baseline minus epsilon comma y Subscript i Baseline right-bracket for any arbitrarily small value of epsilon greater-than 0. If an observation is censored, then the value y Subscript i is ignored. Similarly, for each observation, let upper B Subscript i Baseline equals left-parenthesis t Subscript i Superscript l Baseline comma t Subscript i Superscript r Baseline right-bracket be the truncation interval; that is, the observation is drawn from a truncated (conditional) distribution upper F left-parenthesis y comma upper B Subscript i Baseline right-parenthesis equals upper P left-parenthesis upper Y less-than-or-equal-to y vertical-bar upper Y element-of upper B Subscript i Baseline right-parenthesis.

Two sets, L and R, are formed using upper A Subscript i and upper B Subscript i as follows:

StartLayout 1st Row 1st Column upper L 2nd Column equals StartSet c Subscript i Superscript r Baseline comma 1 less-than-or-equal-to i less-than-or-equal-to upper N EndSet union StartSet t Subscript i Superscript r Baseline comma 1 less-than-or-equal-to i less-than-or-equal-to upper N EndSet 2nd Row 1st Column upper R 2nd Column equals StartSet c Subscript i Superscript l Baseline comma 1 less-than-or-equal-to i less-than-or-equal-to upper N EndSet union StartSet t Subscript i Superscript l Baseline comma 1 less-than-or-equal-to i less-than-or-equal-to upper N EndSet EndLayout

The sets L and R represent the left endpoints and right endpoints, respectively. A set of disjoint intervals upper C Subscript j Baseline equals left-bracket q Subscript j Baseline comma p Subscript j Baseline right-bracket, 1 less-than-or-equal-to j less-than-or-equal-to upper M is formed such that q Subscript j Baseline element-of upper L and p Subscript j Baseline element-of upper R and q Subscript j Baseline less-than-or-equal-to p Subscript j and p Subscript j Baseline less-than q Subscript j plus 1. The value of M is dependent on the nature of censoring and truncation intervals in the input data. Turnbull (1976) showed that the maximum likelihood estimate (MLE) of the EDF can increase only inside intervals upper C Subscript j. In other words, the MLE estimate is constant in the interval left-parenthesis p Subscript j Baseline comma q Subscript j plus 1 Baseline right-parenthesis. The likelihood is independent of the behavior of upper F Subscript n inside any of the intervals upper C Subscript j. Let s Subscript j denote the increase in upper F Subscript n inside an interval upper C Subscript j. Then, the EDF estimate is as follows:

upper F Subscript n Baseline left-parenthesis y right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column 0 2nd Column if y less-than q 1 2nd Row 1st Column sigma-summation Underscript k equals 1 Overscript j Endscripts s Subscript k Baseline 2nd Column if p Subscript j Baseline less-than y less-than q Subscript j plus 1 Baseline comma 1 less-than-or-equal-to j less-than-or-equal-to upper M minus 1 3rd Row 1st Column 1 2nd Column if y greater-than p Subscript upper M Baseline EndLayout

PROC SEVERITY computes the estimates upper F Subscript n Baseline left-parenthesis p Subscript j Baseline plus right-parenthesis equals upper F Subscript n Baseline left-parenthesis q Subscript j plus 1 Baseline minus right-parenthesis equals sigma-summation Underscript k equals 1 Overscript j Endscripts s Subscript k at points p Subscript j and q Subscript j plus 1 and computes upper F Subscript n Baseline left-parenthesis q 1 minus right-parenthesis equals 0 at point q 1, where upper F Subscript n Baseline left-parenthesis x plus right-parenthesis denotes the limiting estimate at a point that is infinitesimally larger than x when approaching x from values larger than x and where upper F Subscript n Baseline left-parenthesis x minus right-parenthesis denotes the limiting estimate at a point that is infinitesimally smaller than x when approaching x from values smaller than x.

PROC SEVERITY uses the expectation-maximization (EM) algorithm proposed by Turnbull (1976), who referred to the algorithm as the self-consistency algorithm. By default, the algorithm runs until one of the following criteria is met:

  • Relative-error criterion: The maximum relative error between the two consecutive estimates of s Subscript j falls below a threshold epsilon. If l indicates an index of the current iteration, then this can be formally stated as

    arg max Underscript 1 less-than-or-equal-to j less-than-or-equal-to upper M Endscripts left-brace StartFraction StartAbsoluteValue s Subscript j Superscript l Baseline minus s Subscript j Superscript l minus 1 Baseline EndAbsoluteValue Over s Subscript j Superscript l minus 1 Baseline EndFraction right-brace less-than-or-equal-to epsilon

    You can control the value of epsilon by specifying the EPS= suboption of the EDF=TURNBULL option in the PROC SEVERITY statement. The default value is 1.0E–8.

  • Maximum-iteration criterion: The number of iterations exceeds an upper limit that you specify for the MAXITER= suboption of the EDF=TURNBULL option in the PROC SEVERITY statement. The default number of maximum iterations is 500.

The self-consistent estimates obtained in this manner might not be maximum likelihood estimates. Gentleman and Geyer (1994) suggested the use of the Kuhn-Tucker conditions for the maximum likelihood problem to ensure that the estimates are MLE. If you specify the ENSUREMLE suboption of the EDF=TURNBULL option in the PROC SEVERITY statement, then PROC SEVERITY computes the Kuhn-Tucker conditions at the end of each iteration to determine whether the estimates {s Subscript j} are MLE. If you do not specify any truncation effects, then the Kuhn-Tucker conditions derived by Gentleman and Geyer (1994) are used. If you specify any truncation effects, then PROC SEVERITY uses modified Kuhn-Tucker conditions that account for the truncation effects. An integral part of checking the conditions is to determine whether an estimate s Subscript j is zero or whether an estimate of the Lagrange multiplier or the reduced gradient associated with the estimate s Subscript j is zero. PROC SEVERITY declares these values to be zero if they are less than or equal to a threshold delta. You can control the value of delta by specifying the ZEROPROB= suboption of the EDF=TURNBULL option in the PROC SEVERITY statement. The default value is 1.0E–8. The algorithm continues until the Kuhn-Tucker conditions are satisfied or the number of iterations exceeds the upper limit. The relative-error criterion stated previously is not used when you specify the ENSUREMLE option.

The standard errors for Turnbull’s EDF estimates are computed by using the asymptotic theory of the maximum likelihood estimators (MLE), even though the final estimates might not be MLE. Turnbull’s estimator essentially attempts to maximize the likelihood L, which depends on the parameters s Subscript j (j equals 1 comma ellipsis comma upper M). Let s s equals left-brace s Subscript j Baseline right-brace denote the set of these parameters. If bold upper G bold left-parenthesis bold s bold s bold right-parenthesis equals nabla squared left-parenthesis minus log left-parenthesis upper L left-parenthesis s s right-parenthesis right-parenthesis right-parenthesis denotes the Hessian matrix of the negative of log likelihood, then the variance-covariance matrix of s s is estimated as ModifyingAbove bold upper C With caret left-parenthesis s s right-parenthesis equals bold upper G Superscript negative 1 Baseline left-parenthesis s s right-parenthesis. Given this matrix, the standard error of upper F Subscript n Baseline left-parenthesis y right-parenthesis is computed as

sigma Subscript n Baseline left-parenthesis y right-parenthesis equals StartRoot sigma-summation Underscript k equals 1 Overscript j Endscripts left-parenthesis ModifyingAbove upper C With caret Subscript k k Baseline plus 2 dot sigma-summation Underscript l equals 1 Overscript k minus 1 Endscripts ModifyingAbove upper C With caret Subscript k l Baseline right-parenthesis EndRoot comma if p Subscript j Baseline less-than y less-than q Subscript j plus 1 Baseline comma 1 less-than-or-equal-to j less-than-or-equal-to upper M minus 1

The standard error is undefined outside of these intervals.

EDF Estimates and Truncation

If you specify truncation, then the estimate ModifyingAbove upper F With caret Subscript n Baseline left-parenthesis y right-parenthesis that is computed by any method other than the STANDARD method is a conditional estimate. In other words, ModifyingAbove upper F With caret Subscript n Baseline left-parenthesis y right-parenthesis equals probability left-parenthesis upper Y less-than-or-equal-to y vertical-bar tau Subscript upper G Baseline less-than upper Y less-than-or-equal-to tau Subscript upper H Baseline right-parenthesis, where G and H denote the (unknown) distribution functions of the left-truncation threshold variable upper T Superscript l and the right-truncation threshold variable upper T Superscript r, respectively, tau Subscript upper G denotes the smallest left-truncation threshold with a nonzero cumulative probability, and tau Subscript upper H denotes the largest right-truncation threshold with a nonzero cumulative probability. Formally, tau Subscript upper G Baseline equals inf left-brace right-brace colon s colon greater-than greater-than of GG left-parenthesis right-parenthesis s 0 and tau Subscript upper H Baseline equals sup left-brace right-brace colon s colon greater-than greater-than of HH left-parenthesis right-parenthesis s 0. For computational purposes, PROC SEVERITY estimates tau Subscript upper G and tau Subscript upper H by t Subscript min Superscript l and t Subscript max Superscript r, respectively, defined as

StartLayout 1st Row 1st Column t Subscript min Superscript l 2nd Column equals min StartSet t Subscript k Superscript l Baseline colon 1 less-than-or-equal-to k less-than-or-equal-to upper N EndSet 2nd Row 1st Column t Subscript max Superscript r 2nd Column equals max StartSet t Subscript k Superscript r Baseline colon 1 less-than-or-equal-to k less-than-or-equal-to upper N EndSet EndLayout

These estimates of t Subscript min Superscript l and t Subscript max Superscript r are used to compute the conditional estimates of the CDF as described in the section Truncation and Conditional CDF Estimates.

If you specify left-truncation with the probability of observability p, then PROC SEVERITY uses the additional information provided by p to compute an estimate of the EDF that is not conditional on the left-truncation information. In particular, for each left-truncated observation i with response variable value y Subscript i and truncation threshold t Subscript i Superscript l, an observation j is added with weight w Subscript j Baseline equals left-parenthesis 1 minus p right-parenthesis slash p and y Subscript j Baseline equals t Subscript j Superscript l. Each added observation is assumed to be uncensored and untruncated. Then, your specified EDF method is used by assuming no left-truncation. The EDF estimate that is obtained using this method is not conditional on the left-truncation information. For the KAPLANMEIER and MODIFIEDKM methods with uncensored or right-censored data, definitions of n left-parenthesis tau right-parenthesis and upper R Subscript n Baseline left-parenthesis tau right-parenthesis are modified to account for the added observations. If upper N Superscript a denotes the total number of observations including the added observations, then n left-parenthesis tau right-parenthesis is defined as n left-parenthesis tau right-parenthesis equals sigma-summation Underscript k equals 1 Overscript upper N Superscript a Baseline Endscripts w Subscript k Baseline upper I left-bracket y Subscript k Baseline equals tau and tau less-than-or-equal-to t Subscript k Superscript r Baseline and delta Subscript k Baseline equals 1 right-bracket, and upper R Subscript n Baseline left-parenthesis tau right-parenthesis is defined as upper R Subscript n Baseline left-parenthesis tau right-parenthesis equals sigma-summation Underscript k equals 1 Overscript upper N Superscript a Baseline Endscripts w Subscript k Baseline upper I left-bracket y Subscript k Baseline greater-than-or-equal-to tau right-bracket. In the definition of upper R Subscript n Baseline left-parenthesis tau right-parenthesis, the left-truncation information is not used, because it was used along with p to add the observations.

If the original data are a combination of left- and right-censored data, then Turnbull’s method is applied to the appended set that contains no left-truncated observations.

Supplying EDF Estimates to Functions and Subroutines

The parameter initialization subroutines in distribution models and some predefined utility functions require EDF estimates. For more information, see the sections Defining a Severity Distribution Model with the FCMP Procedure and Predefined Utility Functions.

PROC SEVERITY supplies the EDF estimates to these subroutines and functions by using two arrays, x and F, the dimension of each array, and a type of the EDF estimates. The type identifies how the EDF estimates are computed and stored. They are as follows:

Type 1

specifies that EDF estimates are computed using the STANDARD method; that is, the data that are used for estimation are neither censored nor truncated.

Type 2

specifies that EDF estimates are computed using either the KAPLANMEIER or the MODIFIEDKM method; that is, the data that are used for estimation are subject to truncation and one type of censoring (left or right, but not both).

Type 3

specifies that EDF estimates are computed using the TURNBULL method; that is, the data that are used for estimation are subject to both left- and right-censoring. The data might or might not be truncated.

For Types 1 and 2, the EDF estimates are stored in arrays x and F of dimension N such that the following holds,

upper F Subscript n Baseline left-parenthesis y right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column 0 2nd Column if y less-than x left-bracket 1 right-bracket 2nd Row 1st Column upper F left-bracket k right-bracket 2nd Column if x left-bracket k right-bracket less-than-or-equal-to y less-than x left-bracket k plus 1 right-bracket comma k equals 1 comma ellipsis comma upper N minus 1 3rd Row 1st Column upper F left-bracket upper N right-bracket 2nd Column if x left-bracket upper N right-bracket less-than-or-equal-to y EndLayout

where left-bracket k right-bracket denotes kth element of the array ([1] denotes the first element of the array).

For Type 3, the EDF estimates are stored in arrays x and F of dimension N such that the following holds:

upper F Subscript n Baseline left-parenthesis y right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column 0 2nd Column if y less-than x left-bracket 1 right-bracket 2nd Row 1st Column undefined 2nd Column if x left-bracket 2 k minus 1 right-bracket less-than-or-equal-to y less-than x left-bracket 2 k right-bracket comma k equals 1 comma ellipsis comma left-parenthesis upper N minus 1 right-parenthesis slash 2 3rd Row 1st Column upper F left-bracket 2 k right-bracket equals upper F left-bracket 2 k plus 1 right-bracket 2nd Column if x left-bracket 2 k right-bracket less-than-or-equal-to y less-than x left-bracket 2 k plus 1 right-bracket comma k equals 1 comma ellipsis comma left-parenthesis upper N minus 1 right-parenthesis slash 2 4th Row 1st Column upper F left-bracket upper N right-bracket 2nd Column if x left-bracket upper N right-bracket less-than-or-equal-to y EndLayout

Although the behavior of EDF is theoretically undefined for the interval left-bracket x left-bracket 2 k minus 1 right-bracket comma x left-bracket 2 k right-bracket right-parenthesis, for computational purposes, all predefined functions and subroutines assume that the EDF increases linearly from upper F left-bracket 2 k minus 1 right-bracket to upper F left-bracket 2 k right-bracket in that interval if x left-bracket 2 k minus 1 right-bracket less-than x left-bracket 2 k right-bracket. If x left-bracket 2 k minus 1 right-bracket equals x left-bracket 2 k right-bracket, which can happen when the EDF is estimated from a combination of uncensored and interval-censored data, the predefined functions and subroutines assume that upper F Subscript n Baseline left-parenthesis x left-bracket 2 k minus 1 right-bracket right-parenthesis equals upper F Subscript n Baseline left-parenthesis x left-bracket 2 k right-bracket right-parenthesis equals upper F left-bracket 2 k right-bracket.

Last updated: June 19, 2025