mixImp: Imputation for a mixture of continuous and categorical...
In jwb133/mlmi: Maximum Likelihood Multiple Imputation

View source: R/miximp.R

mixImp

R Documentation

Imputation for a mixture of continuous and categorical variables using the general location model.

Description

This function performs multiple imputation under a general location model as described by Schafer (1997), using the mix package. Imputation can either be performed using posterior draws (pd=TRUE) or conditonal on the maximum likelihood estimate of the model parameters (pd=FALSE), referred to as maximum likelihood multiple imputation by von Hippel and Bartlett (2021).

Usage

mixImp(
  obsData,
  nCat,
  M = 10,
  pd = FALSE,
  marginsType = 1,
  margins = NULL,
  designType = 1,
  design = NULL,
  steps = 100,
  rseed
)

Arguments

`obsData`	The data frame to be imputed. The categorical variables must be in the first `nCat` columns, and they must be coded using consecutive positive integers.
`nCat`	The number of categorical variables in `obsData`.
`M`	Number of imputations to generate.
`pd`	Specify whether to use posterior draws (`TRUE`) or not (`FALSE`).
`marginsType`	An integer specifying what type of log-linear model to use for the categorical variables. `marginsType=1`, the default, allows for all two-way associations in the log-linear model. `marginsType=2` allows for all three-way associations (plus lower). `marginsType=3` assumes a saturated log-linear model for the categorical variables.
`margins`	If `marginsType` is not specified, `margins` must be supplied to specify the margins of the log-linear model for the categorical variable. See the help for `ecm.mix` for details on specifying `margins`.
`designType`	An integer specifying how the continuous variables' means should depend on the categorical variables. `designType=1`, the default, assumes the mean of each continuous variable is a linear function with main effects of the categorical variables. `designType=2` assumes each continuous variables has a separate mean for each combination of the categorical variables.
`design`	If `designType` is not specified, `design` must be supplied to specify how the mean of the continuous variables depends on the categorical variables. See the help for `ecm.mix` for details on specifying `design`.
`steps`	If `pd` is `TRUE`, the `steps` argument specifies how many MCMC iterations to perform.
`rseed`	The value to set the `mix` package's random number seed to, using the `rngseed` function of `mix`. This function must be called at least once before imputing using `mix`. If the user wishes to set the seed using `rngseed` before calling `mixImp`, set `rseed=NULL`.

Details

See the descriptions for marginsType, margins, designType, design and the documentation in ecm.mix for details about how to specify the model.

Imputed datasets can be analysed using withinBetween, scoreBased, or for example the bootImpute package.

Value

A list of imputed datasets, or if M=1, just the imputed data frame.

References

Schafer J.L. (1997). Analysis of incomplete multivariate data. Chapman & Hall, Boca Raton, Florida, USA.

von Hippel P.T. and Bartlett J.W. Maximum likelihood multiple imputation: faster, more efficient imputation without posterior draws. Statistical Science 2021; 36(3) 400-420 \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1214/20-STS793")}.

Examples

#simulate a partially observed dataset with a mixture of categorical and continuous variables
set.seed(1234)

n <- 100

#for simplicity we simulate completely independent categorical variables
x1 <- ceiling(3*runif(n))
x2 <- ceiling(2*runif(n))
x3 <- ceiling(2*runif(n))
y <- 1+0.5*(x1==2)+1.5*(x1==3)+x2+x3+rnorm(n)

temp <- data.frame(x1=x1,x2=x2,x3=x3,y=y)

#make some data missing in all variables
for (i in 1:4) {
  temp[(runif(n)<0.25),i] <- NA
}

#impute conditional on MLE, assuming two-way associations in the log-linear model
#and main effects of categorical variables on continuous one (the default)
imps <- mixImp(temp, nCat=3, M=10, pd=FALSE, rseed=4423)

jwb133/mlmi documentation built on June 4, 2023, 9:39 a.m.