COUNTREG Procedure

Marginal Likelihood

The Bayes theorem states that

p left-parenthesis theta vertical-bar bold y right-parenthesis proportional-to pi left-parenthesis theta right-parenthesis upper L left-parenthesis y vertical-bar theta right-parenthesis

where is a vector of parameters and is the product of the prior densities, which are specified in the PRIOR statement. The term is the likelihood associated with the MODEL statement. The function is the nonnormalized posterior distribution over the parameter vector . The normalized posterior distribution, or simply the posterior distribution, is

p left-parenthesis theta vertical-bar bold y right-parenthesis equals StartFraction pi left-parenthesis theta right-parenthesis upper L left-parenthesis y vertical-bar theta right-parenthesis Over integral Underscript theta Endscripts pi left-parenthesis theta right-parenthesis upper L left-parenthesis y vertical-bar theta right-parenthesis d theta EndFraction

The denominator , also called the "marginal likelihood," is a quantity of interest because it represents the probability of the data after the effect of the parameter vector has been averaged out. Due to its interpretation, the marginal likelihood can be used in various applications, including model averaging and variable or model selection.

A natural estimate of the marginal likelihood is provided by the harmonic mean,

m left-parenthesis y right-parenthesis equals StartSet StartFraction 1 Over n EndFraction sigma-summation Underscript i equals 1 Overscript n Endscripts StartFraction 1 Over upper L left-parenthesis y vertical-bar theta Subscript i Baseline right-parenthesis EndFraction EndSet Superscript negative 1

where is a sample draw from the posterior distribution. This estimator has proven to be unstable in practical applications.

An alternative and more stable estimator can be obtained by using an importance sampling scheme. The auxiliary distribution for the importance sampler can be chosen through the cross-entropy theory (Chan and Eisenstat 2015). In particular, given a parametric family of distributions, the auxiliary density function is chosen to be the one closest, in terms of the Kullback-Leibler divergence, to the probability density that would give a zero variance estimate of the marginal likelihood. In practical terms, this is equivalent to the following algorithm:

Choose a parametric family, , for the parameters of the model:
Evaluate the maximum likelihood estimator of by using the posterior samples as data
Use to generate the importance samples:
Estimate the marginal likelihood:

The parametric family for the auxiliary distribution is chosen to be Gaussian. The parameters that are subject to bounds are transformed accordingly

If , then .
If , then .
If , then .
If , then .

Assuming independence for the parameters that are subject to bounds, the auxiliary distribution to generate importance samples is

Start 4 By 1 Matrix 1st Row bold p 2nd Row bold q 3rd Row bold r 4th Row bold s EndMatrix tilde bold upper N left-bracket Start 4 By 1 Matrix 1st Row mu Subscript p Baseline 2nd Row mu Subscript q Baseline 3rd Row mu Subscript r Baseline 4th Row mu Subscript s Baseline EndMatrix comma Start 4 By 4 Matrix 1st Row 1st Column normal upper Sigma Subscript p Baseline 2nd Column 0 3rd Column 0 4th Column 0 2nd Row 1st Column 0 2nd Column normal upper Sigma Subscript q Baseline 3rd Column 0 4th Column 0 3rd Row 1st Column 0 2nd Column 0 3rd Column normal upper Sigma Subscript r Baseline 4th Column 0 4th Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column normal upper Sigma Subscript r Baseline EndMatrix right-bracket

where , , and are vectors containing the transformations of the unbounded, bounded-below, bounded-above and bounded-above-and-below parameters. Also, given the imposed independence structure, can be a non-diagonal matrix while , and are imposed to be diagonal matrices.

Standard Distributions

Table 5 through Table 10 show all the distribution density functions that PROC COUNTREG recognizes. You specify these distribution densities in the PRIOR statement.

Table 5: Beta Distribution

PRIOR statement	BETA(SHAPE1=a, SHAPE2=b, MIN=m, MAX=M)
	Note: Commonly and .
Density
Parameter restriction	, ,
Range
Mean
Variance
Mode
Defaults	SHAPE1=SHAPE2=1, ,