mice.impute.rfpred.cate: Univariate sampler function for categorical variables for...
In RfEmpImp: Multiple Imputation using Chained Random Forests

View source: R/mice.impute.rfpred.cate.R

mice.impute.rfpred.cate

R Documentation

Univariate sampler function for categorical variables for prediction-based imputation, using predicted probabilities of random forest

Description

Please note that functions with names starting with "mice.impute" are exported to be visible for the mice sampler functions. Please do not call these functions directly unless you know exactly what you are doing.

For categorical variables only.

Part of project RfEmpImp, the function mice.impute.rfpred.cate is for categorical variables, performing imputation based on predicted probabilities for the categories.

Usage

mice.impute.rfpred.cate(
  y,
  ry,
  x,
  wy = NULL,
  num.trees.cate = 10,
  use.pred.prob.cate = TRUE,
  forest.vote.cate = FALSE,
  pre.boot = TRUE,
  num.threads = NULL,
  ...
)

Arguments

`y`	Vector to be imputed.
`ry`	Logical vector of length `length(y)` indicating the the subset `y[ry]` of elements in `y` to which the imputation model is fitted. The `ry` generally distinguishes the observed (`TRUE`) and missing values (`FALSE`) in `y`.
`x`	Numeric design matrix with `length(y)` rows with predictors for `y`. Matrix `x` may have no missing values.
`wy`	Logical vector of length `length(y)`. A `TRUE` value indicates locations in `y` for which imputations are created.
`num.trees.cate`	Number of trees to build for categorical variables, default to `10`.
`use.pred.prob.cate`	Logical, `TRUE` for assigning categories based on predicted probabilities, `FALSE` for imputation based on random draws from predictions of classification trees, default to `TRUE`. Note that if `forest.vote.cate = TRUE`, then this option is invalid.
`forest.vote.cate`	Logical, `TRUE` for assigning categories based on majority votes of random forests, `FALSE` for imputation based on control of option `use.pred.prob.cate`, default to `FALSE`.
`pre.boot`	Perform bootstrap prior to imputation to get 'proper' multiple imputation, i.e. accommodating sampling variation in estimating population regression parameters (see Shah et al. 2014). It should be noted that if `TRUE`, this option is in effect even if the number of trees is set to one.
`num.threads`	Number of threads for parallel computing. The default is `num.threads = NULL` and all the processors available can be used.
`...`	Other arguments to pass down.

Details

RfEmpImp Imputation sampler for: categorical variables based on predicted probabilities.

Value

Vector with imputed data, same type as y, and of length sum(wy).

Author(s)

Shangzhi Hong

References

Hong, Shangzhi, et al. "Multiple imputation using chained random forests." Preprint, submitted April 30, 2020. https://arxiv.org/abs/2004.14823.

Shah, Anoop D., et al. "Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study." American journal of epidemiology 179.6 (2014): 764-774.

Malley, James D., et al. "Probability machines." Methods of information in medicine 51.01 (2012): 74-81.

Examples

# Prepare data
mtcars.catmcar <- mtcars
mtcars.catmcar[, c("gear", "carb")] <-
    gen.mcar(mtcars.catmcar[, c("gear", "carb")], warn.empty.row = FALSE)
mtcars.catmcar <- conv.factor(mtcars.catmcar, c("gear", "carb"))
# Perform imputation
impObj <- mice(mtcars.catmcar, method = "rfpred.cate", m = 5, maxit = 5,
maxcor = 1.0, eps = 0,
remove.collinear = FALSE, remove.constant = FALSE,
printFlag = FALSE)

RfEmpImp documentation built on Oct. 20, 2022, 9:06 a.m.