mice.impute.2l.zip: Multiple Imputation of Flat File Zero-Inflated Count Data

mice.impute.zipR Documentation

Multiple Imputation of Flat File Zero-Inflated Count Data

Description

The functions impute flat file zero-inflated count data based on a Poisson or negative binomial hurdle model, either using a Bayesian regression or a bootstrap regression approach (appendix: “.boot”). Alternatively, a zero-inflated Poisson or NB model can be specified. Hurdle models are mixture models and consist of two model components: the zero model (a binomial GLM), determining, if the observational unit has a zero or non-zero value, and the count model (a zero-truncated Poisson or NB model), determining, what non-zero value the observational unit has. Zero-inflation models are also mixture models and specify a zero model (here a logit model, determining if the observational unit has a “certain zero” or not) and a count model (here a Poisson or negative binomial model), determining, what count - both zero and non-zero - the observational unit has. Different sets of covariates (predictors) may be used for the zero and the count models.

Usage

mice.impute.zip(y, ry, x, type, wy = NULL)

mice.impute.zip.boot(y, ry, x, type, wy = NULL)

mice.impute.zinb(y, ry, x, type, wy = NULL)

mice.impute.zinb.boot(y, ry, x, type, wy = NULL)

mice.impute.hp(y, ry, x, type, wy = NULL)

mice.impute.hp.boot(y, ry, x, type, wy = NULL)

mice.impute.hnb(y, ry, x, type, wy = NULL)

mice.impute.hnb.boot(y, ry, x, type, wy = NULL)

Arguments

y

Numeric vector with incomplete data

ry

Response pattern of y (TRUE=observed, FALSE=missing)

x

matrix with length(y) rows containing complete covariates

type

vector of length ncol(x) determining the imputation model; type is automatically extracted from the predictorMatrix argument of mice().

wy

Logical vector of length length(y). A TRUE value indicates locations in y for which imputations are created. Default is !ry

Details

The functions multiply impute incomplete zero-inflated count data using either the zeroinfl() function (zero-inflation model) or the hurdle() function (hurdle model) from package pscl (Zeileis, Kleiber, & Jackman, 2008). Model specification details:

  • 0 = variable not included in imputation model

  • 1 = variable will be included in the zero and the count model

  • 2 = variable will be included in the count model

  • 3 = variable will be included in the zero model

The Bayesian regression variants (see Rubin 1987, p. 169-170) consist of the following steps:

  1. Fit the model; find bhat, the posterior mean, and V(bhat), the posterior variance of model parameters b

  2. Draw b* from N(bhat,V(bhat))

  3. Compute predicted probabilities for observing each count p

  4. Draw imputations from observed counts with selection probabilities p

The bootstrap functions draw a bootstrap sample from y[ry] and x[ry,]

  1. Fit the model to the bootstrap sample

  2. Compute predicted probabilities for observing each count p

  3. Draw imputations from observed counts with selection probabilities p

Value

vector with imputations

Functions

  • mice.impute.zip: zero-inflated Poisson model; Bayesian regression variant

  • mice.impute.zip.boot: zero-inflated Poisson model; Bootstrap regression variant

  • mice.impute.zinb: zero-inflated NB model; Bayesian regression variant

  • mice.impute.zinb.boot: zero-inflated NB model; Bootstrap regression variant

  • mice.impute.hp: hurdle Poisson model; Bayesian regression variant

  • mice.impute.hp.boot: hurdle Poisson model; Bootstrap regression variant

  • mice.impute.hnb: hurdle NB model; Bayesian regression variant

  • mice.impute.hnb.boot: hurdle NB model; Bootstrap regression variant

Author(s)

Kristian Kleinke

References

  • Kleinke, K., & Reinecke, J. (2013a). Multiple Imputation of incomplete zero-inflated count data. Statistica Neerlandica, available from http://onlinelibrary.wiley.com/doi/10.1111/stan.12009/abstract.

  • Kleinke, K., & Reinecke, J. (2013b). countimp 1.0 – A multiple imputation package for incomplete count data [Technical Report]. University of Bielefeld, Faculty of Sociology, available from www.uni-bielefeld.de/soz/kds/pdf/countimp.pdf.

  • Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley.

  • Zeileis, A., Kleiber, C., & Jackman, S. (2008). Regression models for count data in R. Journal of Statistical Software, 27(8), 1–25.

Examples

## Example 1:
data(crim4w)
ini <- countimp(crim4w, maxit=0)
meth <- ini$method
meth[6:7] <- "hp"
meth[8:9] <- "pmm"
pred <- ini$predictorMatrix
pred[,"id"] <- 0
pred["ACRIM",] <- c(0,1,3,2,0,3,3,2,1)
imp <- countimp( data = crim4w, method = meth, predictorMatrix = pred )

## Example 2:
## Simulate zero-inflated NB data
b0 <- 1
b1 <- .3
b2 <- .3
c0 <- 0
c1 <- 2
theta <- 1
require("pscl")
set.seed(1234)
N <- 10000
x1 <- rnorm(N)
x2 <- rnorm(N)
x3 <- rnorm(N)
mu <- exp( b0 + b1 * x1 + b2 * x2 )
yzinb <- rnegbin( N, mu, theta)
pzero <- plogis( c1 * x3 )        # zero-infl. prob. depends on x3
## Introduce zero-inflation
uni <- runif(N)
yzinb[uni < pzero] <- 0
zinbdata<-data.frame(yzinb,x1,x2,x3)

## Generate MAR missingness
generate.md <- function( data, pos = 1, Z = 2, pmis = .5, strength = c( .5, .5 ) )
{
total <- round( pmis * nrow(data) )
 sm <- which( data[,Z] < mean( data[,Z] ) )
 gr <- which( data[,Z] > mean( data[,Z] ) )
 sel.sm <- sample( sm, round( strength[1] * total ) )
 sel.gr <- sample( gr, round( strength[2] * total ) )
 sel <- c( sel.sm, sel.gr )
 data[sel,pos] <- NA
 return(data)
}
zinbmdata <- generate.md( zinbdata, pmis = .3, strength = c( .2, .8) )

## Impute missing data
ini <- mice( zinbmdata, m = 5, maxit = 0)
pred <- ini$predictorMatrix 
pred[1,] <- c(0, 2, 2, 3)
meth<-ini$method
meth[1] <- "zinb"
imp.zinb <- countimp( zinbmdata, m = 5, method = meth,
            predictorMatrix = pred, seed = 1234, print = FALSE)

kkleinke/countimp documentation built on Nov. 5, 2024, 11:51 a.m.