The mgcv package (@Wood2006gams) implements zero-inflated Poisson Generalised Additive Models (GAMs) with smoothing parameter estimation by either restricted maximum likelihood (REML) or maximum likelihood (ML). However, it does not permit for the inclusion of an offset term in the model.
The zigam package implements zero-inflated GAMs using the EM Algorithm (@Pawitan2001likelihood) which permits the inclusion of an offset. The predictions should be reliable, but standard errors (and possibly the level of smoothing chosen) will be unreaslistic. Confidence intervals are therefore estimated using a parametric bootstrap (@Efron1993introduction).
Note this example does use a model that includes an offset.
Set a seed for repeatability
set.seed(12032016)
Simulate some data to test
n <- 5000 x1 <- runif(n) x2 <- runif(n) ilogit <- function(x) 1/(1+exp(-x)) p <- ilogit(sin(2*pi*x1)*sin(2*pi*x2)) lambda <- exp(10*x1*(1-x1)) z <- rbinom(n,1,p) y <- rpois(n,lambda*z) d <- data.frame(x1=x1,x2=x2,y=y)
Load the mgcv and zigam packages.
library(mgcv) library(zigam)
Fit a zero-inflated Poisson model.
fit <- zipgam(y ~ s(x1,x2), ~ s(x1,x2), data=d)
Show the inflation model (actually the model for this being a Poisson count)
plot(fit$fit.pi,select=1,scheme=2)
and the model for the mean of the Poisson counts (with the additional zeroes removed)
plot(fit$fit.lambda,select=1,scheme=2)
A difficulty with the EM algorithm is that the estimated standard errors from the two stages of the fit tend to be underestimated, because each fails to recognize the variability in the other.
One approach to estimating variability is the parametric bootstrap - we fit the model to the data, simulate new data from the model and then refit the model to the new data.
Simulate a small data set (for speed)
n <- 200 x1 <- runif(n) x2 <- runif(n) ilogit <- function(x) 1/(1+exp(-x)) p <- ilogit(1+sin(2*pi*x2)) lambda <- exp(10*x1*(1-x1)) z <- rbinom(n,1,p) y <- rpois(n,lambda*z) d <- data.frame(x1=x1,x2=x2,y=y)
Fit a simple model
fit <- zipgam(y ~ s(x1), ~ s(x2), data=d)
If the data we wish to predict for is
x1.pred <- seq(0,1,length.out=40) d.pred <- data.frame(x1=x1.pred,x2=rep(0,40))
Then the parametric bootstrap for the predicted responses is
d.boot <- d boot <- sapply(simulate(fit,20),function(s) { ## Update response d.boot[,"y"] <- s ## Refit model and generate predictions fit.boot <- update(fit,data=d.boot) predict(fit.boot,d.pred,type="response") })
Extract the quantiles and plot
pr.ci <- t(apply(boot,1,quantile,prob=c(0.025,0.5,0.975))) matplot(x1.pred,pr.ci,type="l",lty=1, col=c("dodgerblue1","dodgerblue3","dodgerblue1"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.