View source: R/mice.impute.rfemp.R

mice.impute.rfemp | R Documentation |

Please note that functions with names starting with "mice.impute" are exported to be visible for the mice sampler functions. Please do not call these functions directly unless you know exactly what you are doing.

`RfEmpImp`

multiple imputation method, adapter for `mice`

samplers.
These functions can be called by the `mice`

sampler function. In the
`mice()`

function, set `method = "rfemp"`

to use the `RfEmp`

method.

`mice.impute.rfemp`

is for mixed types of variables, and it calls
corresponding functions according to variable types. Categorical variables
should be of type `factor`

or `logical`

etc.

For continuous variables, `mice.impute.rfpred.emp`

is called, performing
imputation based on the empirical distribution of out-of-bag prediction
errors of random forests.

For categorical variables, `mice.impute.rfpred.cate`

is called,
performing imputation based on predicted probabilities.

mice.impute.rfemp( y, ry, x, wy = NULL, num.trees = 10, alpha.emp = 0, sym.dist = TRUE, pre.boot = TRUE, num.trees.cont = NULL, num.trees.cate = NULL, ... )

`y` |
Vector to be imputed. |

`ry` |
Logical vector of length |

`x` |
Numeric design matrix with |

`wy` |
Logical vector of length |

`num.trees` |
Number of trees to build, default to |

`alpha.emp` |
The "significance level" for empirical distribution of
prediction errors, can be used for prevention for outliers (useful for highly
skewed variables). For example, set alpha = 0.05 to use 95% confidence level
for empirical distribution of prediction errors.
Default is |

`sym.dist` |
If |

`pre.boot` |
Perform bootstrap prior to imputation to get 'proper'
multiple imputation, i.e. accommodating sampling variation in estimating
population regression parameters (see Shah et al. 2014).
It should be noted that if |

`num.trees.cont` |
Number of trees to build for continuous variables,
default to |

`num.trees.cate` |
Number of trees to build for categorical variables,
default to |

`...` |
Other arguments to pass down. |

`RfEmpImp`

imputation sampler, the `mice.impute.rfemp`

calls
`mice.impute.rfpred.emp`

if the variable `is.numeric`

is
`TRUE`

, otherwise it calls `mice.impute.rfpred.cate`

.

Vector with imputed data, same type as `y`

, and of length
`sum(wy)`

.

Shangzhi Hong

Hong, Shangzhi, et al. "Multiple imputation using chained random forests." Preprint, submitted April 30, 2020. https://arxiv.org/abs/2004.14823.

Zhang, Haozhe, et al. "Random Forest Prediction Intervals." The American Statistician (2019): 1-20.

Shah, Anoop D., et al. "Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study." American journal of epidemiology 179.6 (2014): 764-774.

Malley, James D., et al. "Probability machines." Methods of information in medicine 51.01 (2012): 74-81.

# Prepare data: convert categorical variables to factors nhanes.fix <- conv.factor(nhanes, c("age", "hyp")) # This function is exported to be visible to the mice sampler functions, and # users can set method = "rfemp" in call to mice to use this function. # Users are recommended to use the imp.rfemp function instead: impObj <- mice(nhanes.fix, method = "rfemp", m = 5, maxit = 5, maxcor = 1.0, eps = 0, remove.collinear = FALSE, remove.constant = FALSE, printFlag = FALSE )

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.