View source: R/family.zeroinf.R
zipoisson | R Documentation |
Fits a zero-inflated or zero-deflated Poisson distribution by full maximum likelihood estimation.
zipoisson(lpstr0 = "logitlink", llambda = "loglink", type.fitted =
c("mean", "lambda", "pobs0", "pstr0", "onempstr0"),
ipstr0 = NULL, ilambda = NULL, gpstr0 = NULL, imethod = 1,
ishrinkage = 0.95, probs.y = 0.35, parallel = FALSE, zero = NULL)
zipoissonff(llambda = "loglink", lonempstr0 = "logitlink",
type.fitted = c("mean", "lambda", "pobs0", "pstr0", "onempstr0"),
ilambda = NULL, ionempstr0 = NULL, gonempstr0 = NULL,
imethod = 1, ishrinkage = 0.95, probs.y = 0.35, zero = "onempstr0")
lpstr0 , llambda |
Link function for the parameter |
ipstr0 , ilambda |
Optional initial values for |
lonempstr0 , ionempstr0 |
Corresponding arguments for the other parameterization. See details below. |
type.fitted |
Character. The type of fitted value to be returned.
The first choice (the expected value) is the default.
The estimated probability of an observed 0 is an alternative,
else
the estimated probability of a structural 0,
or one minus the estimated probability of a structural 0.
See |
imethod |
An integer with value |
ishrinkage |
How much shrinkage is used when initializing
|
zero |
Specifies which linear/additive predictors are to be
modelled as
intercept-only. If given, the value can be
either 1 or 2, and the
default is none of them. Setting |
gpstr0 , gonempstr0 , probs.y |
Details at |
parallel |
Details at |
These models are a mixture of a Poisson distribution
and the value 0;
it has value 0 with probability \phi
else is
Poisson(\lambda
) distributed.
Thus there are two sources for zero values, and \phi
is the probability of a structural zero.
The model for zipoisson()
can be written
P(Y = 0) = \phi + (1-\phi) \exp(-\lambda),
and for y=1,2,\ldots
,
P(Y = y) = (1-\phi) \exp(-\lambda) \lambda^y / y!.
Here, the parameter \phi
satisfies
0 < \phi < 1
.
The mean of Y
is (1-\phi) \lambda
and these
are returned as the fitted values,
by default.
The variance of Y
is (1-\phi) \lambda (1 + \phi \lambda)
.
By default, the two linear/additive predictors
of zipoisson()
are
(logit(\phi), \log(\lambda))^T
.
The VGAM family function zipoissonff()
has a few
changes compared to zipoisson()
.
These are:
(i) the order of the linear/additive predictors
is switched so the
Poisson mean comes first;
(ii) onempstr0
is now 1 minus the probability
of a structural 0,
i.e., the probability of the parent (Poisson) component,
i.e., onempstr0
is 1-pstr0
;
(iii) argument zero
has a new default so that
the onempstr0
is intercept-only by default.
Now zipoissonff()
is generally recommended
over zipoisson()
(and definitely recommended over yip88
).
Both functions implement Fisher scoring and can handle
multiple responses.
Both family functions
can fit the zero-modified Poisson (ZMP), which
is a combination
of the ZIP and zero-deflated Poisson (ZDP);
see Zipois
for some details and the
example below.
The key is to set the link function to be
identitylink
.
However, problems might occur when iterations get close to
or go past the boundary of the parameter space,
especially when there are covariates.
The PMF of the ZMP is best written not as above
but in terms of onempstr0
which may be greater
than unity; when using pstr0
the above PMF
is negative for non-zero values.
An object of class "vglmff"
(see vglmff-class
).
The object is used by modelling functions
such as vglm
,
rrvglm
and vgam
.
Numerical problems can occur, e.g., when the probability of
zero is actually less than, not more than, the nominal
probability of zero.
For example, in the Angers and Biswas (2003) data below,
replacing 182 by 1 results in nonconvergence.
Half-stepping is not uncommon.
If failure to converge occurs, try using combinations of
imethod
,
ishrinkage
,
ipstr0
, and/or
zipoisson(zero = 1)
if there are explanatory variables.
The default for zipoissonff()
is to model the
structural zero probability as an intercept-only.
This family function can be used to estimate
the 0-deflated model,
hence pstr0
is not to be interpreted as a probability.
One should set, e.g., lpstr0 = "identitylink"
.
Likewise, the functions in Zipois
can handle the zero-deflated Poisson distribution too.
Although the iterations
might fall outside the parameter space,
the validparams
slot
should keep them inside.
A (somewhat) similar alternative for
zero-deflation is to try the zero-altered Poisson model
(see zapoisson
).
The use of this VGAM family function
with rrvglm
can result in a so-called COZIGAM or COZIGLM.
That is, a reduced-rank zero-inflated Poisson model (RR-ZIP)
is a constrained zero-inflated generalized linear model.
See what used to be COZIGAM on CRAN.
A RR-ZINB model can also be fitted easily;
see zinegbinomial
.
Jargon-wise, a COZIGLM might be better described as a
COZIVGLM-ZIP.
T. W. Yee
Thas, O. and Rayner, J. C. W. (2005). Smooth tests for the zero-inflated Poisson distribution. Biometrics, 61, 808–815.
Data: Angers, J-F. and Biswas, A. (2003). A Bayesian analysis of zero-inflated generalized Poisson model. Computational Statistics & Data Analysis, 42, 37–46.
Cameron, A. C. and Trivedi, P. K. (1998). Regression Analysis of Count Data. Cambridge University Press: Cambridge.
M'Kendrick, A. G. (1925). Applications of mathematics to medical problems. Proc. Edinb. Math. Soc., 44, 98–130.
Yee, T. W. (2014). Reduced-rank vector generalized linear models with two linear predictors. Computational Statistics and Data Analysis, 71, 889–902.
gaitdpoisson
,
zapoisson
,
Zipois
,
yip88
,
spikeplot
,
lpossums
,
rrvglm
,
negbinomial
,
zipebcom
,
rpois
,
simulate.vlm
,
hdeff.vglm
.
# Example 1: simulated ZIP data
zdata <- data.frame(x2 = runif(nn <- 1000))
zdata <- transform(zdata,
pstr01 = logitlink(-0.5 + 1*x2, inverse = TRUE),
pstr02 = logitlink( 0.5 - 1*x2, inverse = TRUE),
Ps01 = logitlink(-0.5 , inverse = TRUE),
Ps02 = logitlink( 0.5 , inverse = TRUE),
lambda1 = loglink(-0.5 + 2*x2, inverse = TRUE),
lambda2 = loglink( 0.5 + 2*x2, inverse = TRUE))
zdata <- transform(zdata, y1 = rzipois(nn, lambda1, pstr0 = Ps01),
y2 = rzipois(nn, lambda2, pstr0 = Ps02))
with(zdata, table(y1)) # Eyeball the data
with(zdata, table(y2))
fit1 <- vglm(y1 ~ x2, zipoisson(zero = 1), zdata, crit = "coef")
fit2 <- vglm(y2 ~ x2, zipoisson(zero = 1), zdata, crit = "coef")
coef(fit1, matrix = TRUE) # Should agree with the above values
coef(fit2, matrix = TRUE) # Should agree with the above values
# Fit all two simultaneously, using a different parameterization:
fit12 <- vglm(cbind(y1, y2) ~ x2, zipoissonff, zdata, crit = "coef")
coef(fit12, matrix = TRUE) # Should agree with the above values
# For the first observation compute the probability that y1 is
# due to a structural zero.
(fitted(fit1, type = "pstr0") / fitted(fit1, type = "pobs0"))[1]
# Example 2: McKendrick (1925). From 223 Indian village households
cholera <- data.frame(ncases = 0:4, # Number of cholera cases,
wfreq = c(168, 32, 16, 6, 1)) # Frequencies
fit <- vglm(ncases ~ 1, zipoisson, wei = wfreq, cholera)
coef(fit, matrix = TRUE)
with(cholera, cbind(actual = wfreq,
fitted = round(dzipois(ncases, Coef(fit)[2],
pstr0 = Coef(fit)[1]) *
sum(wfreq), digits = 2)))
# Example 3: data from Angers and Biswas (2003)
abdata <- data.frame(y = 0:7, w = c(182, 41, 12, 2, 2, 0, 0, 1))
abdata <- subset(abdata, w > 0)
fit3 <- vglm(y ~ 1, zipoisson(lpstr0 = probitlink, ipstr0 = 0.8),
data = abdata, weight = w, trace = TRUE)
fitted(fit3, type = "pobs0") # Estimate of P(Y = 0)
coef(fit3, matrix = TRUE)
Coef(fit3) # Estimate of pstr0 and lambda
fitted(fit3)
with(abdata, weighted.mean(y, w)) # Compare this with fitted(fit)
summary(fit3)
# Example 4: zero-deflated (ZDP) model for intercept-only data
zdata <- transform(zdata, lambda3 = loglink(0.0, inverse = TRUE))
zdata <- transform(zdata, deflat.limit=-1/expm1(lambda3)) # Bndy
# The 'pstr0' parameter is negative and in parameter space:
# Not too near the boundary:
zdata <- transform(zdata, usepstr0 = deflat.limit / 2)
zdata <- transform(zdata,
y3 = rzipois(nn, lambda3, pstr0 = usepstr0))
head(zdata)
with(zdata, table(y3)) # A lot of deflation
fit4 <- vglm(y3 ~ 1, data = zdata, trace = TRUE, crit = "coef",
zipoisson(lpstr0 = "identitylink"))
coef(fit4, matrix = TRUE)
# Check how accurate it was:
zdata[1, "usepstr0"] # Answer
coef(fit4)[1] # Estimate
Coef(fit4)
vcov(fit4) # Is positive-definite
# Example 5: RR-ZIP
set.seed(123)
rrzip <- rrvglm(Alopacce ~ sm.bs(WaterCon, df = 3),
zipoisson(zero = NULL),
data = hspider, trace = TRUE, Index.corner = 2)
coef(rrzip, matrix = TRUE)
Coef(rrzip)
summary(rrzip)
## Not run: plotvgam(rrzip, lcol = "blue")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.