# zeroinfl: Zero-inflated Count Data Regression In pscl: Political Science Computational Laboratory

## Description

Fit zero-inflated regression models for count data via maximum likelihood.

## Usage

 ```1 2 3 4 5``` ```zeroinfl(formula, data, subset, na.action, weights, offset, dist = c("poisson", "negbin", "geometric"), link = c("logit", "probit", "cloglog", "cauchit", "log"), control = zeroinfl.control(...), model = TRUE, y = TRUE, x = FALSE, ...) ```

## Arguments

 `formula` symbolic description of the model, see details. `data, subset, na.action` arguments controlling formula processing via `model.frame`. `weights` optional numeric vector of weights. `offset` optional numeric vector with an a priori known component to be included in the linear predictor of the count model. See below for more information on offsets. `dist` character specification of count model family (a log link is always used). `link` character specification of link function in the binary zero-inflation model (a binomial family is always used). `control` a list of control arguments specified via `zeroinfl.control`. `model, y, x` logicals. If `TRUE` the corresponding components of the fit (model frame, response, model matrix) are returned. `...` arguments passed to `zeroinfl.control` in the default setup.

## Details

Zero-inflated count models are two-component mixture models combining a point mass at zero with a proper count distribution. Thus, there are two sources of zeros: zeros may come from both the point mass and from the count component. Usually the count model is a Poisson or negative binomial regression (with log link). The geometric distribution is a special case of the negative binomial with size parameter equal to 1. For modeling the unobserved state (zero vs. count), a binary model is used that captures the probability of zero inflation. in the simplest case only with an intercept but potentially containing regressors. For this zero-inflation model, a binomial model with different links can be used, typically logit or probit.

The `formula` can be used to specify both components of the model: If a `formula` of type `y ~ x1 + x2` is supplied, then the same regressors are employed in both components. This is equivalent to `y ~ x1 + x2 | x1 + x2`. Of course, a different set of regressors could be specified for the count and zero-inflation component, e.g., `y ~ x1 + x2 | z1 + z2 + z3` giving the count data model `y ~ x1 + x2` conditional on (`|`) the zero-inflation model `y ~ z1 + z2 + z3`. A simple inflation model where all zero counts have the same probability of belonging to the zero component can by specified by the formula `y ~ x1 + x2 | 1`.

Offsets can be specified in both components of the model pertaining to count and zero-inflation model: `y ~ x1 + offset(x2) | z1 + z2 + offset(z3)`, where `x2` is used as an offset (i.e., with coefficient fixed to 1) in the count component and `z3` analogously in the zero-inflation component. By the rule stated above `y ~ x1 + offset(x2)` is expanded to `y ~ x1 + offset(x2) | x1 + offset(x2)`. Instead of using the `offset()` wrapper within the `formula`, the `offset` argument can also be employed which sets an offset only for the count model. Thus, `formula = y ~ x1` and `offset = x2` is equivalent to `formula = y ~ x1 + offset(x2) | x1`.

All parameters are estimated by maximum likelihood using `optim`, with control options set in `zeroinfl.control`. Starting values can be supplied, estimated by the EM (expectation maximization) algorithm, or by `glm.fit` (the default). Standard errors are derived numerically using the Hessian matrix returned by `optim`. See `zeroinfl.control` for details.

The returned fitted model object is of class `"zeroinfl"` and is similar to fitted `"glm"` objects. For elements such as `"coefficients"` or `"terms"` a list is returned with elements for the zero and count component, respectively. For details see below.

A set of standard extractor functions for fitted model objects is available for objects of class `"zeroinfl"`, including methods to the generic functions `print`, `summary`, `coef`, `vcov`, `logLik`, `residuals`, `predict`, `fitted`, `terms`, `model.matrix`. See `predict.zeroinfl` for more details on all methods.

## Value

An object of class `"zeroinfl"`, i.e., a list with components including

 `coefficients` a list with elements `"count"` and `"zero"` containing the coefficients from the respective models, `residuals` a vector of raw residuals (observed - fitted), `fitted.values` a vector of fitted means, `optim` a list with the output from the `optim` call for minimizing the negative log-likelihood, `control` the control arguments passed to the `optim` call, `start` the starting values for the parameters passed to the `optim` call, `weights` the case weights used, `offset` a list with elements `"count"` and `"zero"` containing the offset vectors (if any) from the respective models, `n` number of observations (with weights > 0), `df.null` residual degrees of freedom for the null model (= `n - 2`), `df.residual` residual degrees of freedom for fitted model, `terms` a list with elements `"count"`, `"zero"` and `"full"` containing the terms objects for the respective models, `theta` estimate of the additional theta parameter of the negative binomial model (if a negative binomial regression is used), `SE.logtheta` standard error for log(theta), `loglik` log-likelihood of the fitted model, `vcov` covariance matrix of all coefficients in the model (derived from the Hessian of the `optim` output), `dist` character string describing the count distribution used, `link` character string describing the link of the zero-inflation model, `linkinv` the inverse link function corresponding to `link`, `converged` logical indicating successful convergence of `optim`, `call` the original function call, `formula` the original formula, `levels` levels of the categorical regressors, `contrasts` a list with elements `"count"` and `"zero"` containing the contrasts corresponding to `levels` from the respective models, `model` the full model frame (if `model = TRUE`), `y` the response count vector (if `y = TRUE`), `x` a list with elements `"count"` and `"zero"` containing the model matrices from the respective models (if `x = TRUE`),

## Author(s)

Achim Zeileis <Achim.Zeileis@R-project.org>

## References

Cameron, A. Colin and Pravin K. Trevedi. 1998. Regression Analysis of Count Data. New York: Cambridge University Press.

Cameron, A. Colin and Pravin K. Trivedi. 2005. Microeconometrics: Methods and Applications. Cambridge: Cambridge University Press.

Lambert, Diane. 1992. “Zero-Inflated Poisson Regression, with an Application to Defects in Manufacturing.” Technometrics. 34(1):1-14

Zeileis, Achim, Christian Kleiber and Simon Jackman 2008. “Regression Models for Count Data in R.” Journal of Statistical Software, 27(8). URL http://www.jstatsoft.org/v27/i08/.

`zeroinfl.control`, `glm`, `glm.fit`, `glm.nb`, `hurdle`

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17``` ```## data data("bioChemists", package = "pscl") ## without inflation ## ("art ~ ." is "art ~ fem + mar + kid5 + phd + ment") fm_pois <- glm(art ~ ., data = bioChemists, family = poisson) fm_qpois <- glm(art ~ ., data = bioChemists, family = quasipoisson) fm_nb <- MASS::glm.nb(art ~ ., data = bioChemists) ## with simple inflation (no regressors for zero component) fm_zip <- zeroinfl(art ~ . | 1, data = bioChemists) fm_zinb <- zeroinfl(art ~ . | 1, data = bioChemists, dist = "negbin") ## inflation with regressors ## ("art ~ . | ." is "art ~ fem + mar + kid5 + phd + ment | fem + mar + kid5 + phd + ment") fm_zip2 <- zeroinfl(art ~ . | ., data = bioChemists) fm_zinb2 <- zeroinfl(art ~ . | ., data = bioChemists, dist = "negbin") ```

### Example output

```Classes and Methods for R developed in the
Political Science Computational Laboratory
Department of Political Science
Stanford University
Simon Jackman
hurdle and zeroinfl functions by Achim Zeileis
```

pscl documentation built on March 26, 2020, 7:36 p.m.