mice.impute.rfpred.cate: Univariate sampler function for categorical variables for...

View source: R/mice.impute.rfpred.cate.R

mice.impute.rfpred.cateR Documentation

Univariate sampler function for categorical variables for prediction-based imputation, using predicted probabilities of random forest

Description

Please note that functions with names starting with "mice.impute" are exported to be visible for the mice sampler functions. Please do not call these functions directly unless you know exactly what you are doing.

For categorical variables only.

Part of project RfEmpImp, the function mice.impute.rfpred.cate is for categorical variables, performing imputation based on predicted probabilities for the categories.

Usage

mice.impute.rfpred.cate(
  y,
  ry,
  x,
  wy = NULL,
  num.trees.cate = 10,
  use.pred.prob.cate = TRUE,
  forest.vote.cate = FALSE,
  pre.boot = TRUE,
  num.threads = NULL,
  ...
)

Arguments

y

Vector to be imputed.

ry

Logical vector of length length(y) indicating the the subset y[ry] of elements in y to which the imputation model is fitted. The ry generally distinguishes the observed (TRUE) and missing values (FALSE) in y.

x

Numeric design matrix with length(y) rows with predictors for y. Matrix x may have no missing values.

wy

Logical vector of length length(y). A TRUE value indicates locations in y for which imputations are created.

num.trees.cate

Number of trees to build for categorical variables, default to 10.

use.pred.prob.cate

Logical, TRUE for assigning categories based on predicted probabilities, FALSE for imputation based on random draws from predictions of classification trees, default to TRUE. Note that if forest.vote.cate = TRUE, then this option is invalid.

forest.vote.cate

Logical, TRUE for assigning categories based on majority votes of random forests, FALSE for imputation based on control of option use.pred.prob.cate, default to FALSE.

pre.boot

Perform bootstrap prior to imputation to get 'proper' multiple imputation, i.e. accommodating sampling variation in estimating population regression parameters (see Shah et al. 2014). It should be noted that if TRUE, this option is in effect even if the number of trees is set to one.

num.threads

Number of threads for parallel computing. The default is num.threads = NULL and all the processors available can be used.

...

Other arguments to pass down.

Details

RfEmpImp Imputation sampler for: categorical variables based on predicted probabilities.

Value

Vector with imputed data, same type as y, and of length sum(wy).

Author(s)

Shangzhi Hong

References

Hong, Shangzhi, et al. "Multiple imputation using chained random forests." Preprint, submitted April 30, 2020. https://arxiv.org/abs/2004.14823.

Shah, Anoop D., et al. "Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study." American journal of epidemiology 179.6 (2014): 764-774.

Malley, James D., et al. "Probability machines." Methods of information in medicine 51.01 (2012): 74-81.

Examples

# Prepare data
mtcars.catmcar <- mtcars
mtcars.catmcar[, c("gear", "carb")] <-
    gen.mcar(mtcars.catmcar[, c("gear", "carb")], warn.empty.row = FALSE)
mtcars.catmcar <- conv.factor(mtcars.catmcar, c("gear", "carb"))
# Perform imputation
impObj <- mice(mtcars.catmcar, method = "rfpred.cate", m = 5, maxit = 5,
maxcor = 1.0, eps = 0,
remove.collinear = FALSE, remove.constant = FALSE,
printFlag = FALSE)


RfEmpImp documentation built on Oct. 20, 2022, 9:06 a.m.