mice.impute.2l.zip: Multiple Imputation of Flat File Zero-Inflated Count Data
In kkleinke/countimp: Multiple Imputation of incomplete count data

mice.impute.zip

R Documentation

Multiple Imputation of Flat File Zero-Inflated Count Data

Description

The functions impute flat file zero-inflated count data based on a Poisson or negative binomial hurdle model, either using a Bayesian regression or a bootstrap regression approach (appendix: “.boot”). Alternatively, a zero-inflated Poisson or NB model can be specified. Hurdle models are mixture models and consist of two model components: the zero model (a binomial GLM), determining, if the observational unit has a zero or non-zero value, and the count model (a zero-truncated Poisson or NB model), determining, what non-zero value the observational unit has. Zero-inflation models are also mixture models and specify a zero model (here a logit model, determining if the observational unit has a “certain zero” or not) and a count model (here a Poisson or negative binomial model), determining, what count - both zero and non-zero - the observational unit has. Different sets of covariates (predictors) may be used for the zero and the count models.

Usage

mice.impute.zip(y, ry, x, type, wy = NULL)

mice.impute.zip.boot(y, ry, x, type, wy = NULL)

mice.impute.zinb(y, ry, x, type, wy = NULL)

mice.impute.zinb.boot(y, ry, x, type, wy = NULL)

mice.impute.hp(y, ry, x, type, wy = NULL)

mice.impute.hp.boot(y, ry, x, type, wy = NULL)

mice.impute.hnb(y, ry, x, type, wy = NULL)

mice.impute.hnb.boot(y, ry, x, type, wy = NULL)

Arguments

`y`	Numeric vector with incomplete data
`ry`	Response pattern of `y` (`TRUE`=observed, `FALSE`=missing)
`x`	matrix with `length(y)` rows containing complete covariates
`type`	vector of length `ncol(x)` determining the imputation model; `type` is automatically extracted from the `predictorMatrix` argument of `mice()`.
`wy`	Logical vector of length `length(y)`. A `TRUE` value indicates locations in `y` for which imputations are created. Default is `!ry`

Details

The functions multiply impute incomplete zero-inflated count data using either the zeroinfl() function (zero-inflation model) or the hurdle() function (hurdle model) from package pscl (Zeileis, Kleiber, & Jackman, 2008). Model specification details:

0 = variable not included in imputation model
1 = variable will be included in the zero and the count model
2 = variable will be included in the count model
3 = variable will be included in the zero model

The Bayesian regression variants (see Rubin 1987, p. 169-170) consist of the following steps:

Fit the model; find bhat, the posterior mean, and V(bhat), the posterior variance of model parameters b
Draw b* from N(bhat,V(bhat))
Compute predicted probabilities for observing each count p
Draw imputations from observed counts with selection probabilities p

The bootstrap functions draw a bootstrap sample from y[ry] and x[ry,]

Fit the model to the bootstrap sample
Compute predicted probabilities for observing each count p
Draw imputations from observed counts with selection probabilities p

Value

vector with imputations

Functions

mice.impute.zip: zero-inflated Poisson model; Bayesian regression variant
mice.impute.zip.boot: zero-inflated Poisson model; Bootstrap regression variant
mice.impute.zinb: zero-inflated NB model; Bayesian regression variant
mice.impute.zinb.boot: zero-inflated NB model; Bootstrap regression variant
mice.impute.hp: hurdle Poisson model; Bayesian regression variant
mice.impute.hp.boot: hurdle Poisson model; Bootstrap regression variant
mice.impute.hnb: hurdle NB model; Bayesian regression variant
mice.impute.hnb.boot: hurdle NB model; Bootstrap regression variant

Author(s)

Kristian Kleinke

References

Kleinke, K., & Reinecke, J. (2013a). Multiple Imputation of incomplete zero-inflated count data. Statistica Neerlandica, available from http://onlinelibrary.wiley.com/doi/10.1111/stan.12009/abstract.
Kleinke, K., & Reinecke, J. (2013b). countimp 1.0 – A multiple imputation package for incomplete count data [Technical Report]. University of Bielefeld, Faculty of Sociology, available from www.uni-bielefeld.de/soz/kds/pdf/countimp.pdf.
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley.
Zeileis, A., Kleiber, C., & Jackman, S. (2008). Regression models for count data in R. Journal of Statistical Software, 27(8), 1–25.

Examples

## Example 1:
data(crim4w)
ini <- countimp(crim4w, maxit=0)
meth <- ini$method
meth[6:7] <- "hp"
meth[8:9] <- "pmm"
pred <- ini$predictorMatrix
pred[,"id"] <- 0
pred["ACRIM",] <- c(0,1,3,2,0,3,3,2,1)
imp <- countimp( data = crim4w, method = meth, predictorMatrix = pred )

## Example 2:
## Simulate zero-inflated NB data
b0 <- 1
b1 <- .3
b2 <- .3
c0 <- 0
c1 <- 2
theta <- 1
require("pscl")
set.seed(1234)
N <- 10000
x1 <- rnorm(N)
x2 <- rnorm(N)
x3 <- rnorm(N)
mu <- exp( b0 + b1 * x1 + b2 * x2 )
yzinb <- rnegbin( N, mu, theta)
pzero <- plogis( c1 * x3 )        # zero-infl. prob. depends on x3
## Introduce zero-inflation
uni <- runif(N)
yzinb[uni < pzero] <- 0
zinbdata<-data.frame(yzinb,x1,x2,x3)

## Generate MAR missingness
generate.md <- function( data, pos = 1, Z = 2, pmis = .5, strength = c( .5, .5 ) )
{
total <- round( pmis * nrow(data) )
 sm <- which( data[,Z] < mean( data[,Z] ) )
 gr <- which( data[,Z] > mean( data[,Z] ) )
 sel.sm <- sample( sm, round( strength[1] * total ) )
 sel.gr <- sample( gr, round( strength[2] * total ) )
 sel <- c( sel.sm, sel.gr )
 data[sel,pos] <- NA
 return(data)
}
zinbmdata <- generate.md( zinbdata, pmis = .3, strength = c( .2, .8) )

## Impute missing data
ini <- mice( zinbmdata, m = 5, maxit = 0)
pred <- ini$predictorMatrix 
pred[1,] <- c(0, 2, 2, 3)
meth<-ini$method
meth[1] <- "zinb"
imp.zinb <- countimp( zinbmdata, m = 5, method = meth,
            predictorMatrix = pred, seed = 1234, print = FALSE)

kkleinke/countimp documentation built on Nov. 5, 2024, 11:51 a.m.