hcmm_impute: Generate multiply imputed datasets

Description Usage Arguments Value Examples

Description

Imputations are generated using nonparametric Bayesian joint models (specifically the hierarchcially coupled mixture model with local dependence described in Murray and Reiter (2015); see citation(MixedDataImpute) or http://arxiv.org/abs/1410.0438).

Usage

1
2
hcmm_impute(X, Y, kz, kx, ky, hyperpar = NULL, num.impute, num.burnin,
  num.skip, thin.trace = -1, status = 50)

Arguments

X

A data frame of categorical variables (as factors)

Y

A matrix or data frame of continuous variables

kz

Number of top-level clusters

kx

Number of X-model clusters

ky

Number of Y-model clusters

hyperpar

A list of hyperparameter values (see hcmm_hyperpar)

num.impute

Number of imputations

num.burnin

Number of MCMC burn-in iterations

num.skip

Number of MCMC iterations between saved imputations

thin.trace

If negative, only save the num.impute datasets. If positive, save summaries of the model state at every thin.trace iterations for diagnostic purposes.

status

Interval at which to print status messages

Value

A list with three elements:

imputations A list of length num.impute. Each element is an imputed dataset.

trace MCMC output (currently the component sizes for the three mixture indices)

model An interface to the C++ object containing the current state

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
## Not run: 
library(MixedDataImpute)
library(mice) # For the functions implementing combining rules

data(sipp08)

set.seed(1)
n = 1000
s = sample(1:nrow(sipp08), n)

Y = sipp08[s,1:2]
Y[,1] = log(Y[,1]+1)
X = sipp08[s,-c(1:2,9)] # Also removes occ code, which has ~23 levels

# MCAR with probability 0.2, for illustration purposes (not matching the paper)

Y[runif(n)<0.2,1] = NA
Y[runif(n)<0.2,2] = NA
for(j in 1:ncol(X)) X[runif(n)<0.2,j] = NA

kz = 15
ky = 60
kx = 90

num.impute = 5
num.burnin = 10000
num.skip = 1000
thin.trace = 10

imp = hcmm_impute(X, Y, kz=kz, kx=kx, ky=ky,
                  num.impute=num.impute, num.burnin=num.burnin,
                  num.skip=num.skip, thin.trace=thin.trace)

# Example of getting MI estimates for a regression, using the
# pooling functions in mice
form = total_earnings~age+I(age^2) + sex*I(own_kid!=0)

fits = lapply(imp$imputations, function(dat) lm(form, data=dat))
pooled_ests = pool(as.mira(fits))
summary(pooled_ests)

# original, complete data estimates for comparison
comdat = sipp08[s,]
comdat[,1] = log(comdat[,1]+10)
summary(lm(form, data=comdat))

# true population values for comparison
pop = sipp08
pop[,1] = log(pop[,1]+10)
summary(lm(form, data=pop))


## End(Not run)

MixedDataImpute documentation built on May 1, 2019, 9:29 p.m.