Nothing
#' Perform multiple imputation based on the conditional distribution formed
#' using node proximity
#'
#' @description
#' \code{RfNodeProx} multiple imputation method is for mixed types of variables,
#' using conditional distributions formed by proximity measures of random
#' forests (both in-bag and out-of-bag observations will be used for imputation).
#'
#' @details
#' During imputation using \code{imp.rfnode.prox}, for missing observations, the
#' candidate non-missing observations will be found by whether two observations
#' can be retrieved from the same predicting node during prediction. The
#' observations used for imputation may not be necessarily be contained in the
#' terminal node of random forest model.
#'
#' @param data A data frame or a matrix containing the incomplete data. Missing
#' values should be coded as \code{NA}s.
#'
#' @param num.imp Number of multiple imputations. The default is
#' \code{num.imp = 5}.
#'
#' @param max.iter Number of iterations. The default is \code{max.iter = 5}.
#'
#' @param num.trees Number of trees to build. The default is
#' \code{num.trees = 10}.
#'
#' @param pre.boot If \code{TRUE}, bootstrapping prior to imputation will be
#' performed to perform 'proper' multiple imputation, for accommodating sampling
#' variation in estimating population regression parameters
#' (see Shah et al. 2014).
#' It should be noted that if \code{TRUE}, this option is in effect even if the
#' number of trees is set to one.
#'
#' @param print.flag If \code{TRUE}, details will be sent to console. The
#' default is \code{print.flag = FALSE}.
#'
#' @param ... Other arguments to pass down.
#'
#' @return An object of S3 class \code{mids}.
#'
#' @name imp.rfnode.prox
#'
#' @author Shangzhi Hong
#'
#' @references
#' Hong, Shangzhi, et al. "Multiple imputation using chained random forests."
#' Preprint, submitted April 30, 2020. https://arxiv.org/abs/2004.14823.
#'
#' Zhang, Haozhe, et al. "Random Forest Prediction Intervals."
#' The American Statistician (2019): 1-20.
#'
#' Shah, Anoop D., et al. "Comparison of random forest and parametric
#' imputation models for imputing missing data using MICE: a CALIBER study."
#' American journal of epidemiology 179.6 (2014): 764-774.
#'
#' Malley, James D., et al. "Probability machines." Methods of information
#' in medicine 51.01 (2012): 74-81.
#'
#' @examples
#' # Prepare data: convert categorical variables to factors
#' nhanes.fix <- nhanes
#' nhanes.fix[, c("age", "hyp")] <- lapply(nhanes[, c("age", "hyp")], as.factor)
#' # Perform imputation using imp.rfnode.prox
#' imp <- imp.rfnode.prox(nhanes.fix)
#' # Do repeated analyses
#' anl <- with(imp, lm(chl ~ bmi + hyp))
#' # Pool the results
#' pool <- pool(anl)
#' # Get pooled estimates
#' reg.ests(pool)
#'
#' @export
imp.rfnode.prox <- function(
data,
num.imp = 5,
max.iter = 5,
num.trees = 10,
pre.boot = TRUE,
print.flag = FALSE,
...) {
return(mice(
data = data,
method = "rfnode",
m = num.imp,
maxit = max.iter,
num.trees.node = num.trees,
pre.boot = pre.boot,
use.node.cond.dist = FALSE,
obs.eq.prob = FALSE,
do.sample = TRUE,
printFlag = print.flag,
# Bypass remove.lindep() in mice >= 3.9.0
maxcor = 1.0,
eps = 0,
# Bypass collinearity and constant checks
remove.collinear = FALSE,
remove.constant = FALSE,
...))
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.