hurdle  R Documentation 
Fit hurdle regression models for count data via maximum likelihood.
hurdle(formula, data, subset, na.action, weights, offset,
dist = c("poisson", "negbin", "geometric"),
zero.dist = c("binomial", "poisson", "negbin", "geometric"),
link = c("logit", "probit", "cloglog", "cauchit", "log"),
control = hurdle.control(...),
model = TRUE, y = TRUE, x = FALSE, ...)
formula 
symbolic description of the model, see details. 
data, subset, na.action 
arguments controlling formula processing
via 
weights 
optional numeric vector of weights. 
offset 
optional numeric vector with an a priori known component to be included in the linear predictor of the count model. See below for more information on offsets. 
dist 
character specification of count model family. 
zero.dist 
character specification of the zero hurdle model family. 
link 
character specification of link function in the binomial
zero hurdle (only used if 
control 
a list of control arguments specified via

model, y, x 
logicals. If 
... 
arguments passed to 
Hurdle count models are twocomponent models with a truncated count component for positive counts and a hurdle component that models the zero counts. Thus, unlike zeroinflation models, there are not two sources of zeros: the count model is only employed if the hurdle for modeling the occurrence of zeros is exceeded. The count model is typically a truncated Poisson or negative binomial regression (with log link). The geometric distribution is a special case of the negative binomial with size parameter equal to 1. For modeling the hurdle, either a binomial model can be employed or a censored count distribution. The outcome of the hurdle component of the model is the occurrence of a nonzero (positive) count. Thus, for most models, positive coefficients in the hurdle component indicate that an increase in the regressor increases the probability of a nonzero count. Binomial logit and censored geometric models as the hurdle part both lead to the same likelihood function and thus to the same coefficient estimates. A censored negative binomial model for the zero hurdle is only identified if there is at least one nonconstant regressor with (true) coefficient different from zero (and if all coefficients are close to zero the model can be poorly conditioned).
The formula
can be used to specify both components of the model:
If a formula
of type y ~ x1 + x2
is supplied, then the same
regressors are employed in both components. This is equivalent to
y ~ x1 + x2  x1 + x2
. Of course, a different set of regressors
could be specified for the zero hurdle component, e.g.,
y ~ x1 + x2  z1 + z2 + z3
giving the count data model y ~ x1 + x2
conditional on (
) the zero hurdle model y ~ z1 + z2 + z3
.
Offsets can be specified in both parts of the model pertaining to count and
zero hurdle model: y ~ x1 + offset(x2)  z1 + z2 + offset(z3)
, where
x2
is used as an offset (i.e., with coefficient fixed to 1) in the
count part and z3
analogously in the zero hurdle part. By the rule
stated above y ~ x1 + offset(x2)
is expanded to
y ~ x1 + offset(x2)  x1 + offset(x2)
. Instead of using the
offset()
wrapper within the formula
, the offset
argument
can also be employed which sets an offset only for the count model. Thus,
formula = y ~ x1
and offset = x2
is equivalent to
formula = y ~ x1 + offset(x2)  x1
.
All parameters are estimated by maximum likelihood using optim
,
with control options set in hurdle.control
.
Starting values can be supplied, otherwise they are estimated by glm.fit
(the default). By default, the two components of the model are estimated separately
using two optim
calls. Standard errors are derived numerically using
the Hessian matrix returned by optim
. See
hurdle.control
for details.
The returned fitted model object is of class "hurdle"
and is similar
to fitted "glm"
objects. For elements such as "coefficients"
or
"terms"
a list is returned with elements for the zero and count components,
respectively. For details see below.
A set of standard extractor functions for fitted model objects is available for
objects of class "hurdle"
, including methods to the generic functions
print
, summary
, coef
,
vcov
, logLik
, residuals
,
predict
, fitted
, terms
,
model.matrix
. See predict.hurdle
for more details
on all methods.
An object of class "hurdle"
, i.e., a list with components including
coefficients 
a list with elements 
residuals 
a vector of raw residuals (observed  fitted), 
fitted.values 
a vector of fitted means, 
optim 
a list (of lists) with the output(s) from the 
control 
the control arguments passed to the 
start 
the starting values for the parameters passed to the 
weights 
the case weights used, 
offset 
a list with elements 
n 
number of observations (with weights > 0), 
df.null 
residual degrees of freedom for the null model (= 
df.residual 
residual degrees of freedom for fitted model, 
terms 
a list with elements 
theta 
estimate of the additional 
SE.logtheta 
standard error(s) for 
loglik 
loglikelihood of the fitted model, 
vcov 
covariance matrix of all coefficients in the model (derived from the
Hessian of the 
dist 
a list with elements 
link 
character string describing the link if a binomial zero hurdle model is used, 
linkinv 
the inverse link function corresponding to 
converged 
logical indicating successful convergence of 
call 
the original function call, 
formula 
the original formula, 
levels 
levels of the categorical regressors, 
contrasts 
a list with elements 
model 
the full model frame (if 
y 
the response count vector (if 
x 
a list with elements 
Achim Zeileis <Achim.Zeileis@Rproject.org>
Cameron, A. Colin and Pravin K. Trivedi. 1998. Regression Analysis of Count Data. New York: Cambridge University Press.
Cameron, A. Colin and Pravin K. Trivedi 2005. Microeconometrics: Methods and Applications. Cambridge: Cambridge University Press.
Mullahy, J. 1986. Specification and Testing of Some Modified Count Data Models. Journal of Econometrics. 33:341–365.
Zeileis, Achim, Christian Kleiber and Simon Jackman 2008. “Regression Models for Count Data in R.” Journal of Statistical Software, 27(8). URL http://www.jstatsoft.org/v27/i08/.
hurdle.control
, glm
,
glm.fit
, glm.nb
,
zeroinfl
## data
data("bioChemists", package = "pscl")
## logitpoisson
## "art ~ ." is the same as "art ~ .  .", i.e.
## "art ~ fem + mar + kid5 + phd + ment  fem + mar + kid5 + phd + ment"
fm_hp1 < hurdle(art ~ ., data = bioChemists)
summary(fm_hp1)
## geometricpoisson
fm_hp2 < hurdle(art ~ ., data = bioChemists, zero = "geometric")
summary(fm_hp2)
## logit and geometric model are equivalent
coef(fm_hp1, model = "zero")  coef(fm_hp2, model = "zero")
## logitnegbin
fm_hnb1 < hurdle(art ~ ., data = bioChemists, dist = "negbin")
summary(fm_hnb1)
## negbinnegbin
## (poorly conditioned zero hurdle, note the standard errors)
fm_hnb2 < hurdle(art ~ ., data = bioChemists, dist = "negbin", zero = "negbin")
summary(fm_hnb2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.