imp.rfnode.prox: Perform multiple imputation based on the conditional...

View source: R/imp.rfnode.prox.R

imp.rfnode.proxR Documentation

Perform multiple imputation based on the conditional distribution formed using node proximity

Description

RfNodeProx multiple imputation method is for mixed types of variables, using conditional distributions formed by proximity measures of random forests (both in-bag and out-of-bag observations will be used for imputation).

Usage

imp.rfnode.prox(
  data,
  num.imp = 5,
  max.iter = 5,
  num.trees = 10,
  pre.boot = TRUE,
  print.flag = FALSE,
  ...
)

Arguments

data

A data frame or a matrix containing the incomplete data. Missing values should be coded as NAs.

num.imp

Number of multiple imputations. The default is num.imp = 5.

max.iter

Number of iterations. The default is max.iter = 5.

num.trees

Number of trees to build. The default is num.trees = 10.

pre.boot

If TRUE, bootstrapping prior to imputation will be performed to perform 'proper' multiple imputation, for accommodating sampling variation in estimating population regression parameters (see Shah et al. 2014). It should be noted that if TRUE, this option is in effect even if the number of trees is set to one.

print.flag

If TRUE, details will be sent to console. The default is print.flag = FALSE.

...

Other arguments to pass down.

Details

During imputation using imp.rfnode.prox, for missing observations, the candidate non-missing observations will be found by whether two observations can be retrieved from the same predicting node during prediction. The observations used for imputation may not be necessarily be contained in the terminal node of random forest model.

Value

An object of S3 class mids.

Author(s)

Shangzhi Hong

References

Hong, Shangzhi, et al. "Multiple imputation using chained random forests." Preprint, submitted April 30, 2020. https://arxiv.org/abs/2004.14823.

Zhang, Haozhe, et al. "Random Forest Prediction Intervals." The American Statistician (2019): 1-20.

Shah, Anoop D., et al. "Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study." American journal of epidemiology 179.6 (2014): 764-774.

Malley, James D., et al. "Probability machines." Methods of information in medicine 51.01 (2012): 74-81.

Examples

# Prepare data: convert categorical variables to factors
nhanes.fix <- nhanes
nhanes.fix[, c("age", "hyp")] <- lapply(nhanes[, c("age", "hyp")], as.factor)
# Perform imputation using imp.rfnode.prox
imp <- imp.rfnode.prox(nhanes.fix)
# Do repeated analyses
anl <- with(imp, lm(chl ~ bmi + hyp))
# Pool the results
pool <- pool(anl)
# Get pooled estimates
reg.ests(pool)


RfEmpImp documentation built on Oct. 20, 2022, 9:06 a.m.