mice.impute.zip | R Documentation |
The functions impute flat file zero-inflated count data based on a Poisson or negative binomial hurdle model, either using a Bayesian regression or a bootstrap regression approach (appendix: “.boot
”). Alternatively, a zero-inflated Poisson or NB model can be specified. Hurdle models are mixture models and consist of two model components: the zero model (a binomial GLM), determining, if the observational unit has a zero or non-zero value, and the count model (a zero-truncated Poisson or NB model), determining, what non-zero value the observational unit has.
Zero-inflation models are also mixture models and specify a zero model (here a logit model, determining if the observational unit has a “certain zero” or not) and a count model (here a Poisson or negative binomial model), determining, what count - both zero and non-zero - the observational unit has. Different sets of covariates (predictors) may be used for the zero and the count models.
mice.impute.zip(y, ry, x, type, wy = NULL)
mice.impute.zip.boot(y, ry, x, type, wy = NULL)
mice.impute.zinb(y, ry, x, type, wy = NULL)
mice.impute.zinb.boot(y, ry, x, type, wy = NULL)
mice.impute.hp(y, ry, x, type, wy = NULL)
mice.impute.hp.boot(y, ry, x, type, wy = NULL)
mice.impute.hnb(y, ry, x, type, wy = NULL)
mice.impute.hnb.boot(y, ry, x, type, wy = NULL)
y |
Numeric vector with incomplete data |
ry |
Response pattern of |
x |
matrix with |
type |
vector of length |
wy |
Logical vector of length |
The functions multiply impute incomplete zero-inflated count data using either the zeroinfl()
function (zero-inflation model) or the hurdle()
function (hurdle model) from package pscl (Zeileis, Kleiber, & Jackman, 2008).
Model specification details:
0 = variable not included in imputation model
1 = variable will be included in the zero and the count model
2 = variable will be included in the count model
3 = variable will be included in the zero model
The Bayesian regression variants (see Rubin 1987, p. 169-170) consist of the following steps:
Fit the model; find bhat, the posterior mean, and V(bhat), the posterior variance of model parameters b
Draw b* from N(bhat,V(bhat))
Compute predicted probabilities for observing each count p
Draw imputations from observed counts with selection probabilities p
The bootstrap functions draw a bootstrap sample from y[ry]
and x[ry,]
Fit the model to the bootstrap sample
Compute predicted probabilities for observing each count p
Draw imputations from observed counts with selection probabilities p
vector with imputations
mice.impute.zip
: zero-inflated Poisson model; Bayesian regression variant
mice.impute.zip.boot
: zero-inflated Poisson model; Bootstrap regression variant
mice.impute.zinb
: zero-inflated NB model; Bayesian regression variant
mice.impute.zinb.boot
: zero-inflated NB model; Bootstrap regression variant
mice.impute.hp
: hurdle Poisson model; Bayesian regression variant
mice.impute.hp.boot
: hurdle Poisson model; Bootstrap regression variant
mice.impute.hnb
: hurdle NB model; Bayesian regression variant
mice.impute.hnb.boot
: hurdle NB model; Bootstrap regression variant
Kristian Kleinke
Kleinke, K., & Reinecke, J. (2013a). Multiple Imputation of incomplete zero-inflated count data. Statistica Neerlandica, available from http://onlinelibrary.wiley.com/doi/10.1111/stan.12009/abstract.
Kleinke, K., & Reinecke, J. (2013b). countimp 1.0 – A multiple imputation package for incomplete count data [Technical Report]. University of Bielefeld, Faculty of Sociology, available from www.uni-bielefeld.de/soz/kds/pdf/countimp.pdf.
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley.
Zeileis, A., Kleiber, C., & Jackman, S. (2008). Regression models for count data in R. Journal of Statistical Software, 27(8), 1–25.
## Example 1:
data(crim4w)
ini <- countimp(crim4w, maxit=0)
meth <- ini$method
meth[6:7] <- "hp"
meth[8:9] <- "pmm"
pred <- ini$predictorMatrix
pred[,"id"] <- 0
pred["ACRIM",] <- c(0,1,3,2,0,3,3,2,1)
imp <- countimp( data = crim4w, method = meth, predictorMatrix = pred )
## Example 2:
## Simulate zero-inflated NB data
b0 <- 1
b1 <- .3
b2 <- .3
c0 <- 0
c1 <- 2
theta <- 1
require("pscl")
set.seed(1234)
N <- 10000
x1 <- rnorm(N)
x2 <- rnorm(N)
x3 <- rnorm(N)
mu <- exp( b0 + b1 * x1 + b2 * x2 )
yzinb <- rnegbin( N, mu, theta)
pzero <- plogis( c1 * x3 ) # zero-infl. prob. depends on x3
## Introduce zero-inflation
uni <- runif(N)
yzinb[uni < pzero] <- 0
zinbdata<-data.frame(yzinb,x1,x2,x3)
## Generate MAR missingness
generate.md <- function( data, pos = 1, Z = 2, pmis = .5, strength = c( .5, .5 ) )
{
total <- round( pmis * nrow(data) )
sm <- which( data[,Z] < mean( data[,Z] ) )
gr <- which( data[,Z] > mean( data[,Z] ) )
sel.sm <- sample( sm, round( strength[1] * total ) )
sel.gr <- sample( gr, round( strength[2] * total ) )
sel <- c( sel.sm, sel.gr )
data[sel,pos] <- NA
return(data)
}
zinbmdata <- generate.md( zinbdata, pmis = .3, strength = c( .2, .8) )
## Impute missing data
ini <- mice( zinbmdata, m = 5, maxit = 0)
pred <- ini$predictorMatrix
pred[1,] <- c(0, 2, 2, 3)
meth<-ini$method
meth[1] <- "zinb"
imp.zinb <- countimp( zinbmdata, m = 5, method = meth,
predictorMatrix = pred, seed = 1234, print = FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.