mice.impute.rfemp: Univariate sampler function for mixed types of variables for...

View source: R/mice.impute.rfemp.R

mice.impute.rfempR Documentation

Univariate sampler function for mixed types of variables for prediction-based imputation, using empirical distribution of out-of-bag prediction errors and predicted probabilities of random forests

Description

Please note that functions with names starting with "mice.impute" are exported to be visible for the mice sampler functions. Please do not call these functions directly unless you know exactly what you are doing.

RfEmpImp multiple imputation method, adapter for mice samplers. These functions can be called by the mice sampler function. In the mice() function, set method = "rfemp" to use the RfEmp method.

mice.impute.rfemp is for mixed types of variables, and it calls corresponding functions according to variable types. Categorical variables should be of type factor or logical etc.

For continuous variables, mice.impute.rfpred.emp is called, performing imputation based on the empirical distribution of out-of-bag prediction errors of random forests.

For categorical variables, mice.impute.rfpred.cate is called, performing imputation based on predicted probabilities.

Usage

mice.impute.rfemp(
  y,
  ry,
  x,
  wy = NULL,
  num.trees = 10,
  alpha.emp = 0,
  sym.dist = TRUE,
  pre.boot = TRUE,
  num.trees.cont = NULL,
  num.trees.cate = NULL,
  ...
)

Arguments

y

Vector to be imputed.

ry

Logical vector of length length(y) indicating the the subset y[ry] of elements in y to which the imputation model is fitted. The ry generally distinguishes the observed (TRUE) and missing values (FALSE) in y.

x

Numeric design matrix with length(y) rows with predictors for y. Matrix x may have no missing values.

wy

Logical vector of length length(y). A TRUE value indicates locations in y for which imputations are created.

num.trees

Number of trees to build, default to 10.

alpha.emp

The "significance level" for empirical distribution of prediction errors, can be used for prevention for outliers (useful for highly skewed variables). For example, set alpha = 0.05 to use 95% confidence level for empirical distribution of prediction errors. Default is 0.0, and the empirical error distribution is kept intact.

sym.dist

If TRUE, the empirical distribution of out-of-bag prediction errors will be assumed to be symmetric; if FALSE, the empirical distribution will be kept intact. The default is sym.dist = TRUE. This option is invalid when emp.err.cont is set to FALSE.

pre.boot

Perform bootstrap prior to imputation to get 'proper' multiple imputation, i.e. accommodating sampling variation in estimating population regression parameters (see Shah et al. 2014). It should be noted that if TRUE, this option is in effect even if the number of trees is set to one.

num.trees.cont

Number of trees to build for continuous variables, default to NULL to use the value of num.trees.

num.trees.cate

Number of trees to build for categorical variables, default to NULL to use the value of num.trees.

...

Other arguments to pass down.

Details

RfEmpImp imputation sampler, the mice.impute.rfemp calls mice.impute.rfpred.emp if the variable is.numeric is TRUE, otherwise it calls mice.impute.rfpred.cate.

Value

Vector with imputed data, same type as y, and of length sum(wy).

Author(s)

Shangzhi Hong

References

Hong, Shangzhi, et al. "Multiple imputation using chained random forests." Preprint, submitted April 30, 2020. https://arxiv.org/abs/2004.14823.

Zhang, Haozhe, et al. "Random Forest Prediction Intervals." The American Statistician (2019): 1-20.

Shah, Anoop D., et al. "Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study." American journal of epidemiology 179.6 (2014): 764-774.

Malley, James D., et al. "Probability machines." Methods of information in medicine 51.01 (2012): 74-81.

Examples

# Prepare data: convert categorical variables to factors
nhanes.fix <- conv.factor(nhanes, c("age", "hyp"))

# This function is exported to be visible to the mice sampler functions, and
# users can set method = "rfemp" in call to mice to use this function.
# Users are recommended to use the imp.rfemp function instead:
impObj <- mice(nhanes.fix, method = "rfemp", m = 5,
maxit = 5, maxcor = 1.0, eps = 0,
remove.collinear = FALSE, remove.constant = FALSE,
printFlag = FALSE
)


RfEmpImp documentation built on Oct. 20, 2022, 9:06 a.m.