Multiple Iterative Regression Imputation

Description

Generate a multiply imputed matrix applying the elementary functions iteratively to the variables with missingness in the data randomly imputing each variable and looping through until approximate convergence.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
## S4 method for signature 'data.frame'
mi(object, info,  n.imp = 3, n.iter = 30,
    R.hat = 1.1, max.minutes = 20, rand.imp.method = "bootstrap",
    run.past.convergence = FALSE,
    seed = NA, check.coef.convergence = FALSE,
    add.noise = noise.control())

## S4 method for signature 'mi.preprocessed'
mi(object, n.imp = 3, n.iter = 30,
    R.hat = 1.1, max.minutes = 20, rand.imp.method = "bootstrap",
    run.past.convergence = FALSE,
    seed = NA, check.coef.convergence = FALSE,
    add.noise = noise.control())


## S4 method for signature 'mi'
mi(object, n.iter = 30,
    R.hat = 1.1, max.minutes = 20, rand.imp.method = "bootstrap",
    run.past.convergence = FALSE,  seed = NA)

Arguments

object

A data frame or an mi object that contains an incomplete data. mi identifies NAs as the missing data.

info

The mi.info object.

n.imp

The number of multiple imputations. Default is 3 chains.

n.iter

The maximum number of imputation iterations. Default is 30 iterations.

R.hat

The value of the R.hat statistic used as a convergence criterion. Default is 1.1.

max.minutes

The maximum minutes to operate the whole imputation process. Default is 20 minutes.

rand.imp.method

The methods for random imputation. Currently, mi implements only the boostrap method.

run.past.convergence

Default is FALSE. If the value is set to be TRUE, mi will run until the values of either n.iter or max.minutes are reached even if the imputation is converged.

seed

The random number seed.

check.coef.convergence

Default is FALSE. If the value is set to be TRUE, mi will check the convergence of the coefficients of imputation models.

add.noise

A list of parameters for controlling the process of adding noise to mi via noise.control.

Details

Generate multiple imputations for incomplete data using iterative regression imputation. If the variables with missingness are a matrix Y with columns Y(1), . . . , Y(K) and the fully observed predictors are X, this entails first imputing all the missing Y values using some crude approach (for example, choosing imputed values for each variable by randomly selecting from the observed outcomes of that variable); and then imputing Y(1) given Y(2), . . . , Y(K) and X; imputing Y(2) given Y(1), Y(3), . . . , Y(K) and X (using the newly imputed values for Y(1)), and so forth, randomly imputing each variable and looping through until approximate convergence.

Value

A list of object of class mi, which stands for “multiple imputation”.

Each object is itself a list of 10 elements.

call

Theimputation model.

data

The original data frame.

m

The number of imputations.

mi.info

Information matrix of the mi.

imp

A list of length(m) of imputations.

mcmc

A mcmc list that stores lists of means and sds of the imputed data.

converged

Binary variable to indicate if the mi has converged.

coef.mcmc

A mcmc list that stores lists of means of regression coefficients of the conditonal models.

coef.converged

Binary variable to indicate if the coefs of mi model have converged, return NULL if check.coef.convergence = FALSE

preprocess

Binary variable to indicate if preprocess=TRUE in the mi process

mi.info.preprocessed

Information matrix that actually used in the mi if preprocess=TRUE.

Each imp[[m]] is itself a list containg k variable lists of 3 objects:

imp[[m]][[k]]@model

the specified models used for imputing missing values

imp[[m]][[k]]@expected

a list of vectors of length n-n.mis (number of complete observed data), specifying the estimated values of the models

imp[[m]][[k]]@random

a list of vectors of length n.mis (number of NAs), specifying the random predicted values for imputing missing data

Author(s)

Masanao Yajima yajima@stat.columbia.edu, Yu-Sung Su suyusung@tsinghua.edu.cn, M. Grazia Pittau grazia@stat.columbia.edu, Andrew Gelman gelman@stat.columbia.edu

References

Yu-Sung Su, Andrew Gelman, Jennifer Hill, Masanao Yajima. (2011). “Multiple Imputation with Diagnostics (mi) in R: Opening Windows into the Black Box”. Journal of Statistical Software 45(2).

Kobi Abayomi, Andrew Gelman and Marc Levy. (2008). “Diagnostics for multivariate imputations”. Applied Statistics 57, Part 3: 273–291.

Andrew Gelman and Jennifer Hill. (2007). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press.

See Also

mi.completed, mi.data.frame, mi.continuous, mi.binary, mi.count, mi.categorical, mi.polr, typecast, mi.info, mi.preprocess

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
# simulate fake data
set.seed(100)
n <- 100
u1 <- rbinom(n, 1, .5)
v1 <- log(rnorm(n, 5, 1))
x1 <- u1*exp(v1)
u2 <- rbinom(n, 1, .5)
v2 <- log(rnorm(n, 5, 1))
x2 <- u2*exp(v2)
x3 <- rbinom(n, 1, prob=0.45)
x4 <- ordered(rep(seq(1, 5),100)[sample(1:n, n)])
x5 <- rep(letters[1:10],10)[sample(1:n, n)]
x6 <- trunc(runif(n, 1, 10))
x7 <- rnorm(n)
x8 <- factor(rep(seq(1,10),10)[sample(1:n, n)])
x9 <- runif(n, 0.1, .99)
x10 <- rpois(n, 4)
y <- x1 + x2 + x7 + x9 + rnorm(n)
fakedata <- cbind.data.frame(y, x1, x2, x3, x4, x5, x6, x7, x8, x9, x10)

# randomly create missing values
dat <- mi:::.create.missing(fakedata, pct.mis=30)

# get information matrix of the data
inf <- mi.info(dat)

# update the variable type of a specific variable to mi.info
inf <- update(inf, "type", list(x10="count"))

# run the imputation without data transformation
#IMP <- mi(dat, info=inf, check.coef.convergence=TRUE,
#  add.noise=noise.control(post.run.iter=10))

# run the imputation with data transformation
dat.transformed <- mi.preprocess(dat, inf)
#IMP <- mi(dat.transformed, n.iter=6, check.coef.convergence=TRUE,
#  add.noise=noise.control(post.run.iter=6))

IMP <- mi(dat.transformed, n.iter=6, add.noise=FALSE)


# no noise
# IMP <- mi(dat, info=inf, n.iter=6, add.noise=FALSE) ## NOT RUN

# pick up where you left off
# IMP <- mi(IMP, n.iter = 6)

## this is the suggested (defautl) way of running mi
# IMP <- mi(dat, info=inf) ## NOT RUN

# convergence checking
converged(IMP, check = "data")  ## You should get FALSE here because only n.iter is small
#converged(IMP, check = "coefs")
IMP.bugs1 <- bugs.mi(IMP, check = "data")    ## BUGS object to look at the R hat statistics
#IMP.bugs2 <- bugs.mi(IMP, check = "coefs")   ## BUGS object to look at the R hat statistics
plot(IMP.bugs1)  ## visually check R.hat

# visually check the imputation
plot(IMP)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.