mice.impute.quasipoisson: Multiple Imputation of Overdispersed Count Data based on a...

mice.impute.quasipoissonR Documentation

Multiple Imputation of Overdispersed Count Data based on a quasipoisson GLM

Description

Imputes univariate missing data based on a quasipoisson GLM following either the Bayesian regression or bootstrap regression (appendix .boot) MI approach.

Usage

mice.impute.quasipoisson(y, ry, x, wy = NULL, EV = TRUE, ...)

mice.impute.qpois(y, ry, x, wy = NULL, EV = TRUE, ...)

mice.impute.quasipoisson.boot(y, ry, x, wy = NULL, EV = TRUE, ...)

mice.impute.qpois.boot(y, ry, x, wy = NULL, EV = TRUE, ...)

Arguments

y

Numeric vector with incomplete data

ry

Response pattern of y (TRUE=observed, FALSE=missing)

x

matrix with length(y) rows containing complete covariates

wy

Logical vector of length length(y). A TRUE value indicates locations in y for which imputations are created. Default is !ry

EV

should automatic outlier handling of imputed values be enabled? Default is TRUE: extreme imputations will be identified. These values will be replaced by imputations obtained by predictive mean matching (function mice.impute.midastouch())

...

Other named arguments.

Details

Overdispersed count data (meaning that the variance of the count variable is larger than its the mean) are typically analyzed by a negative binomial (NB) or by a quasipoisson model. The quasipoisson model is identical to an ordinary Poisson model, except that it estimates an additional dispersion parameter. For details, see Zeileis, Kleiber, & Jackman (2008), or Hilbe (2007). The Bayesian method consists of the following steps:

  1. Fit the model, and find bhat, the posterior mean, and V(bhat), the posterior variance of model parameters b.

  2. Draw b.star from N(bhat,V(bhat)).

  3. Compute fitted values using exp(x[!ry, ] %*% b.star)

  4. Simulate imputations from a negative binomial distribution to ensure an adequate dispersion of imputed values.

quasipoisson imputation relies on the standard glm.fit function, using the quasipoisson family. The bootstrap method draws a bootstrap sample from y[ry] and x[ry,] and consists of the following steps:

  1. Fit the model to the bootstrap sample and get model parameters b.star

  2. Compute fitted values using exp(x[!ry, ] %*% b.star)

  3. Simulate imputations from a negative binomial distribution to ensure an adequate dispersion of imputed values.

Value

Numeric vector of length sum(!ry) with imputations

Functions

  • mice.impute.quasipoisson: Bayesian regression variant

  • mice.impute.qpois: identical to mice.impute.quasipoisson(); included for backward compatibility

  • mice.impute.quasipoisson.boot: Bootstrap regression variant

  • mice.impute.qpois.boot: identical to mice.impute.quasipoisson.boot(); included for backward compatibility

Author(s)

Kristian Kleinke

References

  • Hilbe, J. M. (2007). Negative binomial regression. Cambridge: Cambridge University Press.

  • Kleinke, K., & Reinecke, J. (2013). countimp 1.0 – A multiple imputation package for incomplete count data [Technical Report]. University of Bielefeld, Faculty of Sociology, available from www.uni-bielefeld.de/soz/kds/pdf/countimp.pdf.

  • Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley.

  • Zeileis, A., Kleiber, C., & Jackman, S. (2008). Regression models for count data in R. Journal of Statistical Software, 27(8), 1–-25.

Examples

## simulate overdespersed count data
set.seed( 1234 )
b0 <- 1
b1 <- .75
b2 <- -.25
b3 <- .5
N <- 5000
x1 <- rnorm(N)
x2 <- rnorm(N)
x3 <- rnorm(N)
mu <- exp( b0 + b1 * x1 + b2 * x2 + b3 * x3 )
y <- MASS::rnegbin( N, theta = 2, mu )
NB <- data.frame( y, x1, x2, x3 )

## introduce MAR missingness to simulated data
total <- round( .2 * N )  ##number of missing data in y
sm <- which( NB[,2] < mean( NB[,2] ) )  ##subset: cases with x2<mean(x2)
gr <- which( NB[,2] > mean( NB[,2] ) )	##subset: cases with x2>mean(x2)
sel.sm <- sample( sm, round( .2 * total ) )	##select cases to set as missing
sel.gr <- sample( gr, round( .8 * total ) )	##select cases to set as missing
sel <- c( sel.sm,sel.gr )
MNB <- NB
MNB[sel,1] <- NA	##delete selected data

## impute missing data
imp <- countimp( MNB, method = c( "quasipoisson", "", "", "" )) 

kkleinke/countimp documentation built on Nov. 5, 2024, 11:51 a.m.