COUNTREG Procedure

Zero-Inflated Count Regression Overview

The main motivation for zero-inflated count models is that real-life data frequently display overdispersion and excess zeros. Zero-inflated count models provide a way of modeling the excess zeros in addition to allowing for overdispersion. In particular, for each observation, there are two possible data generation processes. The result of a Bernoulli trial is used to determine which of the two processes is used. For observation i, Process 1 is chosen with probability phi Subscript i and Process 2 with probability 1 minus phi Subscript i. Process 1 generates only zero counts. Process 2 generates counts from either a Poisson or a negative binomial model. In general,

y Subscript i Baseline tilde StartLayout Enlarged left-brace 1st Row 1st Column 0 2nd Column phi Subscript i Baseline 2nd Row 1st Column g left-parenthesis y Subscript i Baseline right-parenthesis 2nd Column 1 minus phi Subscript i EndLayout

Therefore, the probability of StartSet upper Y Subscript i Baseline equals y Subscript i Baseline EndSet can be described as

StartLayout 1st Row 1st Column upper P left-parenthesis y Subscript i Baseline equals 0 vertical-bar bold x Subscript i Baseline right-parenthesis 2nd Column equals 3rd Column phi Subscript i Baseline plus left-parenthesis 1 minus phi Subscript i Baseline right-parenthesis g left-parenthesis 0 right-parenthesis 2nd Row 1st Column upper P left-parenthesis y Subscript i Baseline vertical-bar bold x Subscript i Baseline right-parenthesis 2nd Column equals 3rd Column left-parenthesis 1 minus phi Subscript i Baseline right-parenthesis g left-parenthesis y Subscript i Baseline right-parenthesis comma y Subscript i Baseline greater-than 0 EndLayout

where g left-parenthesis y Subscript i Baseline right-parenthesis follows either the Poisson or the negative binomial distribution. You can specify the probability phi by using the PROBZERO= option in the OUTPUT statement.

When the probability phi Subscript i depends on the characteristics of observation i, phi Subscript i is written as a function of bold z prime Subscript i Baseline bold-italic gamma, where bold z prime Subscript i is the 1 times left-parenthesis q plus 1 right-parenthesis vector of zero-inflation covariates and bold-italic gamma is the left-parenthesis q plus 1 right-parenthesis times 1 vector of zero-inflation coefficients to be estimated. (The zero-inflation intercept is gamma 0; the coefficients for the q zero-inflation covariates are gamma 1 comma ellipsis comma gamma Subscript q Baseline.) The function F that relates the product bold z prime Subscript i Baseline bold-italic gamma (which is a scalar) to the probability phi Subscript i is called the zero-inflation link function,

phi Subscript i Baseline equals upper F Subscript i Baseline equals upper F left-parenthesis bold z prime Subscript i Baseline bold-italic gamma right-parenthesis

In the COUNTREG procedure, the zero-inflation covariates are indicated in the ZEROMODEL statement. Furthermore, the zero-inflation link function F can be specified as either the logistic function,

upper F left-parenthesis bold z prime Subscript i Baseline bold-italic gamma right-parenthesis equals normal upper Lamda left-parenthesis bold z prime Subscript i Baseline bold-italic gamma right-parenthesis equals StartFraction exp left-parenthesis bold z prime Subscript i Baseline bold-italic gamma right-parenthesis Over 1 plus exp left-parenthesis bold z prime Subscript i Baseline bold-italic gamma right-parenthesis EndFraction

or the standard normal cumulative distribution function (also called the probit function),

upper F left-parenthesis bold z prime Subscript i Baseline bold-italic gamma right-parenthesis equals normal upper Phi left-parenthesis bold z prime Subscript i Baseline bold-italic gamma right-parenthesis equals integral Subscript 0 Superscript bold z prime Subscript i Baseline bold-italic gamma Baseline StartFraction 1 Over StartRoot 2 pi EndRoot EndFraction exp left-parenthesis minus u squared slash 2 right-parenthesis d u

The zero-inflation link function is indicated in the LINK option in ZEROMODEL statement. The default ZI link function is the logistic function.

Last updated: June 19, 2025