zeroinfl | R Documentation |
Fit zero-inflated regression models for count data via maximum likelihood.
zeroinfl(formula, data, subset, na.action, weights, offset,
dist = c("poisson", "negbin", "geometric", "binomial"),
link = c("logit", "probit", "cloglog", "cauchit", "log"),
size = NULL, control = zeroinfl.control(...),
model = TRUE, y = TRUE, x = FALSE, ...)
formula |
symbolic description of the model, see details. |
data, subset, na.action |
arguments controlling formula processing
via |
weights |
optional numeric vector of weights. |
offset |
optional numeric vector with an a priori known component to be included in the linear predictor of the count model. See below for more information on offsets. |
dist |
character specification of count model family (a log link is always used). |
link |
character specification of link function in the binary zero-inflation model (a binomial family is always used). |
size |
size parameter in case the a binomial count model is used
( |
control |
a list of control arguments specified via
|
model, y, x |
logicals. If |
... |
arguments passed to |
Zero-inflated count models are two-component mixture models combining a point mass at zero with a proper count distribution. Thus, there are two sources of zeros: zeros may come from both the point mass and from the count component. Usually the count model is a Poisson or negative binomial regression (with log link). The geometric distribution is a special case of the negative binomial with size parameter equal to 1. For modeling the unobserved state (zero vs. count), a binary model is used that captures the probability of zero inflation. in the simplest case only with an intercept but potentially containing regressors. For this zero-inflation model, a binomial model with different links can be used, typically logit or probit.
The formula
can be used to specify both components of the model:
If a formula
of type y ~ x1 + x2
is supplied, then the same
regressors are employed in both components. This is equivalent to
y ~ x1 + x2 | x1 + x2
. Of course, a different set of regressors
could be specified for the count and zero-inflation component, e.g.,
y ~ x1 + x2 | z1 + z2 + z3
giving the count data model y ~ x1 + x2
conditional on (|
) the zero-inflation model y ~ z1 + z2 + z3
.
A simple inflation model where all zero counts have the same
probability of belonging to the zero component can by specified by the formula
y ~ x1 + x2 | 1
.
Offsets can be specified in both components of the model pertaining to count and
zero-inflation model: y ~ x1 + offset(x2) | z1 + z2 + offset(z3)
, where
x2
is used as an offset (i.e., with coefficient fixed to 1) in the
count component and z3
analogously in the zero-inflation component. By the rule
stated above y ~ x1 + offset(x2)
is expanded to
y ~ x1 + offset(x2) | x1 + offset(x2)
. Instead of using the
offset()
wrapper within the formula
, the offset
argument
can also be employed which sets an offset only for the count model. Thus,
formula = y ~ x1
and offset = x2
is equivalent to
formula = y ~ x1 + offset(x2) | x1
.
All parameters are estimated by maximum likelihood using optim
,
with control options set in zeroinfl.control
.
Starting values can be supplied, estimated by the EM (expectation maximization)
algorithm, or by glm.fit
(the default). Standard errors
are derived numerically using the Hessian matrix returned by optim
.
See zeroinfl.control
for details.
The returned fitted model object is of class "zeroinfl"
and is similar
to fitted "glm"
objects. For elements such as "coefficients"
or
"terms"
a list is returned with elements for the zero and count component,
respectively. For details see below.
A set of standard extractor functions for fitted model objects is available for
objects of class "zeroinfl"
, including methods to the generic functions
print
, summary
, coef
,
vcov
, logLik
, residuals
,
predict
, fitted
, terms
,
model.matrix
. See predict.zeroinfl
for more details
on all methods.
An object of class "zeroinfl"
, i.e., a list with components including
coefficients |
a list with elements |
residuals |
a vector of raw residuals (observed - fitted), |
fitted.values |
a vector of fitted means, |
optim |
a list with the output from the |
control |
the control arguments passed to the |
start |
the starting values for the parameters passed to the |
weights |
the case weights used, |
offset |
a list with elements |
n |
number of observations (with weights > 0), |
df.null |
residual degrees of freedom for the null model (= |
df.residual |
residual degrees of freedom for fitted model, |
terms |
a list with elements |
theta |
estimate of the additional |
SE.logtheta |
standard error for |
loglik |
log-likelihood of the fitted model, |
vcov |
covariance matrix of all coefficients in the model (derived from the
Hessian of the |
dist |
character string describing the count distribution used, |
link |
character string describing the link of the zero-inflation model, |
linkinv |
the inverse link function corresponding to |
converged |
logical indicating successful convergence of |
call |
the original function call, |
formula |
the original formula, |
levels |
levels of the categorical regressors, |
contrasts |
a list with elements |
model |
the full model frame (if |
y |
the response count vector (if |
x |
a list with elements |
Cameron AC, Trivedi PK (2013). Regression Analysis of Count Data, 2nd ed. New York: Cambridge University Press.
Cameron AC, Trivedi PK (2005). Microeconometrics: Methods and Applications. Cambridge: Cambridge University Press.
Lambert D (1992). “Zero-Inflated Poisson Regression, with an Application to Defects in Manufacturing”. Technometrics. 34(1), 1–14.
Zeileis A, Kleiber C, Jackman S (2008). “Regression Models for Count Data in R.” Journal of Statistical Software, 27(8), 1–25. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v027.i08")}.
zeroinfl.control
, glm
,
glm.fit
, glm.nb
,
hurdle
## data
data("CrabSatellites", package = "countreg")
cs <- CrabSatellites[, c("satellites", "width", "color")]
cs$color <- as.numeric(cs$color)
## without inflation
## ("satellites ~ ." is "satellites ~ width + color")
fm_pois <- glm(satellites ~ ., data = cs, family = poisson)
fm_qpois <- glm(satellites ~ ., data = cs, family = quasipoisson)
fm_nb <- glm.nb(satellites ~ ., data = cs)
## with simple inflation (no regressors for zero component)
fm_zip <- zeroinfl(satellites ~ . | 1, data = cs)
fm_zinb <- zeroinfl(satellites ~ . | 1, data = cs, dist = "negbin")
## inflation with regressors
## ("satellites ~ . | ." is "satellites ~ width + color | width + color")
fm_zip2 <- zeroinfl(satellites ~ . | ., data = cs)
fm_zinb2 <- zeroinfl(satellites ~ . | ., data = cs, dist = "negbin")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.