imp.rfemp | R Documentation |
RfEmp
multiple imputation method is for mixed types of variables,
and calls corresponding functions based on variable types.
Categorical variables should be of type factor
or logical
, etc.
RfPred.Emp
is used for continuous variables, and RfPred.Cate
is used for categorical variables.
imp.rfemp( data, num.imp = 5, max.iter = 5, num.trees = 10, alpha.emp = 0, sym.dist = TRUE, pre.boot = TRUE, num.trees.cont = NULL, num.trees.cate = NULL, num.threads = NULL, print.flag = FALSE, ... )
data |
A data frame or a matrix containing the incomplete data. Missing
values should be coded as |
num.imp |
Number of multiple imputations. The default is
|
max.iter |
Number of iterations. The default is |
num.trees |
Number of trees to build. The default is
|
alpha.emp |
The "significance level" for the empirical distribution of
out-of-bag prediction errors, can be used for prevention for outliers
(helpful for highly skewed variables).
For example, set alpha = 0.05 to use 95% confidence level.
The default is |
sym.dist |
If |
pre.boot |
If |
num.trees.cont |
Number of trees to build for continuous variables.
The default is |
num.trees.cate |
Number of trees to build for categorical variables,
The default is |
num.threads |
Number of threads for parallel computing. The default is
|
print.flag |
If |
... |
Other arguments to pass down. |
For continuous variables, mice.impute.rfpred.emp
is called, performing
imputation based on the empirical distribution of out-of-bag
prediction errors of random forests.
For categorical variables, mice.impute.rfpred.cate
is called,
performing imputation based on predicted probabilities.
An object of S3 class mids
.
Shangzhi Hong
Hong, Shangzhi, et al. "Multiple imputation using chained random forests." Preprint, submitted April 30, 2020. https://arxiv.org/abs/2004.14823.
Zhang, Haozhe, et al. "Random Forest Prediction Intervals." The American Statistician (2019): 1-20.
Shah, Anoop D., et al. "Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study." American journal of epidemiology 179.6 (2014): 764-774.
Malley, James D., et al. "Probability machines." Methods of information in medicine 51.01 (2012): 74-81.
# Prepare data: convert categorical variables to factors nhanes.fix <- nhanes nhanes.fix[, c("age", "hyp")] <- lapply(nhanes[, c("age", "hyp")], as.factor) # Perform imputation using imp.rfemp imp <- imp.rfemp(nhanes.fix) # Do repeated analyses anl <- with(imp, lm(chl ~ bmi + hyp)) # Pool the results pool <- pool(anl) # Get pooled estimates reg.ests(pool)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.