R/missForest.R
In missForestPredict: Missing Value Imputation using Random Forest for Prediction Settings

Documented in missForest

#' Imputes a dataframe and returns imputation models to be used on new observations
#'
#' Imputes a dataframe and (if save_models = TRUE) returns imputation models to be used on new observations.
#'
#' An adaptation of the original missForest algorithm (Stekhoven et al. 2012) is used.
#' Variables are initialized with a mean/mode, median/mode or custom imputation.
#' Then, they are imputed iteratively "on the fly" for a maximum number of iterations or until the convergence criteria are met.
#' The imputation sequence is either increasing or decreasing.
#' At each iteration, a random forest model is build for each variable using as outcome on the observed (non-missing) values
#' of the variable and as predictors the values of the other variables from previous iteration
#' for the first variable in the sequence or current iteration for next variables in the sequence
#' (on-the-fly). The ranger package (Wright et al. 2017) is used for building the random forest models.
#'
#' The convergence criterion is based on the out-of-boostrap (OOB) error or the apparent error and uses NMSE (normalized mean squared error)
#' for both continuous and categorical variables.
#'
#' Imputation models for all variables and all iterations are saved (if \code{save_models} is TRUE)  and can be later
#' applied to new observations.
#'
#' Both dataframe and tibble (tbl_df class) are supported as input. The imputed dataframe will be retured with the same class.
#' Numeric and integer columns are supported and treated internally as continuous variables.
#' Factor and character columns are supported and treated internally as categorical variables.
#' Other types (like boolean or dates) are not supported.
#' NA values are considered missing values.
#'
#' @param xmis dataframe containing missing values of class dataframe ("tibble" class tbl_df is also supported). Matrix format is not supported. See details for column format.
#' @param maxiter maximum number of iterations. By default the algorithm will stop when converge is reached or
#' after running for maxiter, whichever occurs first.
#' @param var_weights named vector of weights for each variable in the convergence criteria.
#' The names should correspond to variable names.
#' By default the weights are set to the proportion of missing values on each variable.
#' @param decreasing (boolean) if TRUE the order in which the variables are imputed is by decreasing amount of missing values.
#' (the variable with highest amount of missing values will be imputed first). If FALSE the variable with lowest amount of missing values will be imputed first.
#' @param fixed_maxiter if set to TRUE, the algorithm will run for the exact number of iterations specified in maxiter,
#' regardless of the convergence criteria. Default is FALSE.
#' @param initialization initialization method before running RF models; supported: mean/mode, median/mode and custom. Default is mean/mode.
#' @param x_init if \code{initialization = custom}; a complete dataframe to be used as initialization (see vignette for example).
#' @param class.weights a named list containing \code{class.weights} parameter to be passed to ranger for categorical variables.
#' The names of the list needs to respect the names of the categorical variables in the dataframe.
#' (See \code{ranger} function documentation in \code{ranger} package for details).
#' @param return_integer_as_integer Internally, integer columns are treated as double (double precision floating point numbers).
#' If TRUE, the imputations will be rounded to closest integer and returned as integer (This might be desirable for count variables).
#' If FALSE, integer columns will be returned as double (This might be desirable, for example, for patient age imputation).
#' Default is FALSE. The same behaviour will be applied to new observations when using missForestPredict.
#' @param save_models if TRUE, imputation models are saved and a new observation (or a test set) can be imputed using the models learned;
#' saving models on a dataset with a high number of variables will occupy RAM memory on the machine. Default is TRUE.
#' @param predictor_matrix predictor matrix indicating which variables to use in the imputation of each variable.
#' See documentation for function \code{create_predictor_matrix} for details on the matrix format.
#' @param proportion_usable_cases a vector with two components: the first one is a minimum threshold for \code{p_obs}
#' and the second one is a maximum threshold for \code{p_miss}. Variables for which \code{p_obs} is greater than or equal to 1 (by default)
#' will be filtered from the predictor matrix. Variables for which \code{p_miss} is lower than or equal to 0 (by default)
#' will be filtered from the predictor matrix. For more details on \code{p_obs} and \code{p_miss} see the documentation for
#' the \code{prop_usable_cases} function. If parameter \code{predictor_matrix} is specified, the \code{proportion_usable_cases}
#' will be applied to this provided matrix.
#' @param verbose (boolean) if TRUE then missForest returns OOB error estimates (MSE and NMSE) and runtime.
#' @param convergence_error Which error should be used for the convergence criterion. Supported values: OOB and apparent.
#' If a different value is provided, it defaults to OOB. See vignette for full details on convergence.
#' @param ... other arguments passed to ranger function (some arguments that are specific to each variable type are not supported).
#' See vignette for \code{num.trees} example.
#'
#' @return Object of class \code{missForest} with elements
#'     \item{\code{ximp}}{dataframe with imputed values}
#'     \item{\code{init}}{x_init if custom initalization is used; otherwise list of mean/mode or median/mode for each variable}
#'     \item{\code{initialization}}{value of initialization parameter}
#'     \item{\code{impute_sequence}}{vector variable names in the order in which imputation has been run}
#'     \item{\code{maxiter}}{maxiter parameter as passed to the function}
#'     \item{\code{models}}{list of random forest models for each iteration}
#'     \item{\code{return_integer_as_integer}}{Parameter return_integer_as_integer as passed to the function}
#'     \item{\code{integer_columns}}{list of columns of integer type in the data}
#'     \item{\code{OOB_err}}{dataframe with out-of-bag errors for each iteration and each variable}
#' @references
#' \itemize{
#'     \item Stekhoven, D. J., & Bühlmann, P. (2012). MissForest-non-parametric missing value imputation for mixed-type data. Bioinformatics, 28(1), 112-118. \doi{10.1093/bioinformatics/btr597}
#'     \item Wright, M. N. & Ziegler, A. (2017). ranger: A fast implementation of random forests for high dimensional data in C++ and R. J Stat Softw 77:1-17. \doi{10.18637/jss.v077.i01}.
#'   }
#' @examples
#' data(iris)
#' iris_mis <- produce_NA(iris, proportion = 0.1)
#' imputation_object <- missForest(iris_mis, num.threads = 2)
#' iris_imp <- imputation_object$ximp
#'
#' @import ranger
#' @importFrom stats median
#' @importFrom stats predict
#' @importFrom stats var
#' @importFrom stats complete.cases
#' @importFrom stats weighted.mean
#' @export

missForest <- function(xmis,
                       maxiter = 10,
                       fixed_maxiter = FALSE,
                       var_weights = NULL,
                       decreasing = FALSE,
                       initialization = "mean/mode",
                       x_init = NULL,
                       class.weights = NULL,
                       return_integer_as_integer = FALSE,
                       save_models = TRUE,
                       predictor_matrix = NULL,
                       proportion_usable_cases = c(1,0),
                       verbose = TRUE,
                       convergence_error = "OOB",
                       ...){

  unsupported_args <- c("case.weights", "splitrule", "num.random.splits",
                        "alpha", "minprop", "split.select.weights", "always.split.variables",
                        "inbag", "holdout", "quantreg", "oob.error", "dependent.variable.name",
                        "status.variable.name", "classification")

  unsupported_args <- unsupported_args[unsupported_args %in% names(list(...))]
  if (length(unsupported_args) > 0){
    stop(sprintf("The following argument(s) to ranger function are not supported: %s", paste(unsupported_args, collapse = ", ")))
  }

  if (!convergence_error %in% c("OOB", "apparent")){
    message(sprintf("Supported values for convergence_error are: OOB and apparent. The value passed was: %s. Defaulting to OOB.",
                    convergence_error))
    convergence_error <- "OOB"
  }

  if (!is.numeric(maxiter) |  maxiter < 1) stop("maxiter need to be greater than 1")

  if (!initialization %in% c("mean/mode", "median/mode", "custom"))
    stop("initialization has to be mean/mode, median/mode or custom")

  if (initialization == "custom"){
    if (is.null(x_init))
      stop("An initialized dataframe has to be provided in x_init if the initialization mode is custom")
    if (any(any(dim(xmis) != dim(x_init))))
      stop(sprintf("x_init needs to have the same dimensions as xmis: %s", paste(dim(xmis), collapse = ", ")))
    if (sum(!complete.cases(x_init)) > 0){
      stop("x_init needs to be a complete dataframe with no missing values")
    }
  }

  # calculate proportion of usable cases
  prop_usable_cases <- prop_usable_cases(xmis)

  p_obs <- proportion_usable_cases[[1]]
  p_mis <- proportion_usable_cases[[2]]

  prop_usable_cases_matrix_obs <- prop_usable_cases$p_obs < p_obs
  prop_usable_cases_matrix_miss <- prop_usable_cases$p_miss > p_mis
  # make diagonal always FALSE (do not include a var as predictor for itself)
  diag(prop_usable_cases_matrix_miss) <- FALSE
  # inclusion matrix
  usable_cases_matrix <- prop_usable_cases_matrix_obs & prop_usable_cases_matrix_miss
  usable_cases_matrix[is.na(usable_cases_matrix)] <- TRUE
  usable_cases_matrix <- usable_cases_matrix + 0

  # prepare predictor matrix
  if (!is.null(predictor_matrix)){
    check_predictor_matrix(predictor_matrix, xmis, verbose = FALSE)
  } else {
    predictor_matrix <- create_predictor_matrix(xmis)
  }

  predictor_matrix <- (predictor_matrix & usable_cases_matrix) + 0

  # variables included / skipped from imputation
  vars_skipped_to_impute <- rownames(predictor_matrix)[rowSums(predictor_matrix) == 0]
  vars_included_to_impute <- rownames(predictor_matrix)[!rowSums(predictor_matrix) == 0]
  # variables used / not used as predictors
  vars_skipped_as_pred <- colnames(predictor_matrix)[colSums(predictor_matrix) == 0]
  vars_included_as_pred <- colnames(predictor_matrix)[!colSums(predictor_matrix) == 0]
  vars_used <- unique(c(vars_included_to_impute, vars_included_as_pred))

  if (!is.null(var_weights)){

    if (is.null(names(var_weights)))
      stop("var_weights needs to be a named vector.")

    is_in_var_weights <- vars_included_to_impute %in% names(var_weights)

    if (!all(is_in_var_weights))
      stop("Variables %s need to be imputed and are not specified in var_weights.",
           vars_included_to_impute[!is_in_var_weights])

  }

  if (any(sapply(xmis[,vars_used], simplify = 'matrix', is.infinite)))
    stop("The dataframe contains Inf values. Inf values are not supported.")

  NA_only_cols <- colSums(is.na(xmis)) == nrow(xmis)

  if (any(NA_only_cols))
    stop(sprintf("There are columns that contain only missing values: %s",
                 paste(names(NA_only_cols)[NA_only_cols], collapse = ", ")))

  # check variable types
  column_class <- function(x) ifelse(is.numeric(x), "numeric",
                                     ifelse(is.factor(x) | is.character(x), "factor", NA_character_))

  var_type <- unlist(lapply(xmis[,vars_used], column_class))

  if (any(is.na(var_type)))
    stop(sprintf("Only numeric or factor columns are supported. Logical or other types are not supported. Unsupported variables: %s",
                 paste(names(var_type)[is.na(var_type)], collapse = ", ")))

  # check class.weights for factor variables
  if(!is.null(class.weights)){
    if (!is.list(class.weights) | is.null(names(class.weights)) )
      stop("class.weights needs to be a named list.")

    factor_vars <- names(var_type[var_type == "factor"])
    is_in_class_weights <- factor_vars %in% names(class.weights)

    if(!all(is_in_class_weights))
      stop("All categorical variables need to be in class.weights. Following variables not in class.weights.: %s",
           factor_vars[!is_in_class_weights])

    for (c in factor_vars) {
      n_classes <- length(unique(xmis[!is.na(xmis[,c, drop = TRUE]), c, drop = TRUE]))
      if (length(class.weights[[c]]) != n_classes)
        stop(sprintf("class.weights for variable %s needs to be a vector of size %s (the number of classes in the variable), not %s",
                     c, n_classes, length(class.weights[[c]])))
    }
  }

  # check factor / character with many levels
  for (c in vars_included_to_impute) {
    if (var_type[c] == "factor"){
      unique_levels <- length(unique(xmis[, c, drop = TRUE]))
      if (unique_levels > 53)
        message(sprintf("Variable %s has %s levels. Imputation might be slow. Check that variables are not factor or character by mistake.",
                        c, unique_levels))
    }
  }

  # perform initialization (mean/mode imputation)
  if (initialization == "custom") {
    ximp <- x_init
    if (save_models) var_init <- x_init

    # make all integer columns double (imputed values might not be integer)
    integer_columns <- colnames(ximp)[unlist(lapply(ximp, is.integer))]
    if (length(integer_columns) > 0) {
      ximp[, integer_columns] <- lapply(ximp[, integer_columns, drop = FALSE], as.double)
    }

  } else {
    ximp <- xmis

    # make all integer columns double (imputed values might not be integer)
    integer_columns <- colnames(ximp[,vars_used])[unlist(lapply(ximp[,vars_used], is.integer))]
    if (length(integer_columns) > 0) {
      ximp[, integer_columns] <- lapply(ximp[, integer_columns, drop = FALSE], as.double)
    }

    var_init <- vector("list", length(vars_used))
    names(var_init) <- vars_used

    for (col in vars_used) {
      if (var_type[[col]] == "numeric") {
        # keep mean or median
        if (initialization == "mean/mode"){
          mean_col <- mean(xmis[, col, drop = TRUE], na.rm = TRUE)
        } else if (initialization == "median/mode"){
          mean_col <- median(xmis[, col, drop = TRUE], na.rm = TRUE)
        }

        var_init[[col]] <- mean_col

        # initialize ximp column
        ximp[is.na(xmis[, col, drop = TRUE]), col] <- mean_col

      } else { # factor
        # take maximum number of samples in one class (ignore NA)
        max_level <- max(table(ximp[, col, drop = TRUE], useNA = "no"))
        summary_col <- summary(as.factor(ximp[!is.na(ximp[, col, drop = TRUE]), col, drop = TRUE]))
        # if there are several classes with equal number of samples, sample one at random
        mode_col <- sample(names(which(max_level == summary_col)), 1)
        # keep mode
        var_init[[col]] <- mode_col

        # initialize ximp column
        ximp[is.na(xmis[, col, drop = TRUE]),col] <- mode_col

      }
    }
  }

  # extract missingness pattern
  NAloc <- is.na(xmis)            # where are missings
  noNAvar <- apply(NAloc, 2, sum) # how many are missing in the vars
  miss_proportion <- noNAvar/nrow(xmis)
  # imputation sequence, first the lowest missingness column if decreasing = FALSE
  impute_sequence <- names(sort(noNAvar, decreasing = decreasing))
  impute_sequence <- impute_sequence[impute_sequence %in% vars_included_to_impute]

  # set weights
  if(is.null(var_weights)) var_weights <- miss_proportion
  if (sum(var_weights) == 0)
    var_weights <- replace(var_weights, names(var_weights), 1)
  var_weights <- var_weights[names(var_weights) %in% vars_included_to_impute]

  # set class weights
  if (is.null(class.weights) & any(var_type == "factor")) {
    factor_vars <- names(var_type[var_type == "factor"])

    class.weights <- list()
    for (var in factor_vars){
      xmis[,var] <- as.factor(xmis[,var, drop = TRUE]) # in case it is character
      class_props <- prop.table(table(xmis[,var, drop = TRUE], useNA = "no"))
      class_props <- class_props[levels(xmis[,var, drop = TRUE])]

      class.weights[[var]] <- nrow(xmis) / (length(class_props) * class_props)
    }
  }

  # keep OOB error
  OOB_err <- data.frame(iteration = sort(rep(0:maxiter, length(impute_sequence))),
                        variable = rep(impute_sequence, maxiter + 1),
                        MSE = NA_real_,
                        NMSE = NA_real_,
                        MER = NA_real_,
                        macro_F1 = NA_real_,
                        F1_score = NA_real_)
  # keep apparent error
  apparent_err <- data.frame(iteration = sort(rep(0:maxiter, length(impute_sequence))),
                             variable = rep(impute_sequence, maxiter + 1),
                             NMSE = NA_real_)


  # calculate errors for initialization
  # there is no real OOB error, we consider the apparent error
  # right now only valid for median imputation
  for (col in impute_sequence){

    # not possible to calculate for custom as the observed values are not initialized
    # not possible to calculate for mode imputation as it is not a probability
    if (initialization == "custom" | var_type[col] != "numeric"){
      OOB_err[OOB_err$iteration == 0 & OOB_err$variable == col, "NMSE"] <- 1
      apparent_err[apparent_err$iteration == 0 & apparent_err$variable == col, "NMSE"] <- 1
    } else { # for numeric and mean/mode initialization
      obsi <- !NAloc[, col]
      obsY <- ximp[obsi, col, drop = TRUE]

      init_values <- rep(var_init[[col]], length(obsY))

      init_NMSE <- ifelse(var(obsY) == 0, 1, # corner case scenario
                          nmse(init_values, obsY))

      OOB_err[OOB_err$iteration == 0 & OOB_err$variable == col, "NMSE"] <- init_NMSE
      apparent_err[apparent_err$iteration == 0 & apparent_err$variable == col, "NMSE"] <- init_NMSE

    }
  }

  iter <- 1
  converged <- FALSE
  if (save_models) models <- list() else models <- NULL

  if (verbose) {
    if (length(vars_skipped_to_impute) > 0) {
      vars_skipped_prop <- sapply(vars_skipped_to_impute, function(x) sprintf("%s (%s)", x, miss_proportion[[x]]))
      cat("Variables skipped from imputation (missing proportion): ", vars_skipped_prop, "\n")
    }

    impute_sequence_prop <- sapply(impute_sequence, function(x) sprintf("%s (%s)", x, miss_proportion[[x]]))
    cat("Imputation sequence (missing proportion): ", impute_sequence_prop, "\n")
  }

  # iterate RF models
  while (!converged & iter <= maxiter){

    if (save_models) models[[iter]] <- list()

    if (verbose) cat("  missForest iteration", iter, "in progress...")

    t_start <- proc.time()
    # store old imputation for fixed iterations
    ximp_old <- ximp

    for (col in impute_sequence){

      obsi <- !NAloc[, col]
      misi <- NAloc[, col]
      obsY <- ximp[obsi, col, drop = TRUE]

      predictor_cols <- colnames(predictor_matrix)[predictor_matrix[col,] == 1]
      obsX <- ximp[obsi, predictor_cols, drop = FALSE]
      misX <- ximp[misi, predictor_cols, drop = FALSE]

      if (var_type[col] == "numeric") {

        RF <- ranger(x = obsX, y = obsY,
                     ...)

        # save model
        if (save_models) models[[iter]][[col]] <- RF

        # if the column is not complete, impute the missing values
        if (nrow(misX) > 0) {
          misY <- predict(RF, misX)$predictions
          ximp[misi, col] <- misY
        }

      } else { # categorical
        obsY <- factor(obsY, levels = unique(ximp[, col, drop = TRUE]))

        RF <- ranger(x = obsX, y = obsY,
                     probability = TRUE,
                     class.weights = class.weights[[col]][levels(obsY)],
                     ...)

        # save model
        if (save_models) models[[iter]][[col]] <- RF

        # if the column is not complete, impute the missing values
        if (nrow(misX) > 0) {
          # if factor, return factor
          preds <- predict(RF, misX)$predictions
          levels <- colnames(preds)

          # impute
          ximp[misi, col] <- apply(preds, 1, function(x) levels[which.max(x)])
        }
      }

      # check if all observations in data are out-of-bag at least once
      if (is.matrix(RF$predictions)){ # class probabilities are returned as matrix
        oob_i <- !apply(is.na(RF$predictions), 1, any)
      } else {
        oob_i <- !is.na(RF$predictions)
      }

      if (sum(!oob_i) > 0){
        warning(sprintf("For variable %s there are %s observations out of %s for which the OOB error cannot be evaluated. These observations are excluded. Consider increasing the number of trees (num.trees) to ensure reliable results",
                        col, sum(!oob_i), length(obsY)))
      }

      # evaluate OOB error
      if (var_type[col] == "numeric") {

        # OOB error
        OOB_MSE <- mse(RF$predictions[oob_i], obsY[oob_i])
        OOB_NMSE <- ifelse(var(obsY[oob_i]) == 0, 1, # corner case scenario
                           nmse(RF$predictions[oob_i], obsY[oob_i]))

        OOB_err[OOB_err$iteration == iter & OOB_err$variable == col, "MSE"] <- OOB_MSE
        OOB_err[OOB_err$iteration == iter & OOB_err$variable == col, "NMSE"] <- OOB_NMSE

        # apparent error
        apparent_preds <- predict(RF, obsX)$predictions
        apparent_NMSE <- ifelse(var(obsY) == 0, 1, # corner case scenario
                                nmse(apparent_preds, obsY))
        apparent_err[apparent_err$iteration == iter &
                       apparent_err$variable == col, "NMSE"] <- apparent_NMSE


      } else { # categorical

        # drop levels if they don't exist
        ximp_binary <- make_binary(factor(ximp[, col, drop = TRUE], levels = unique(ximp[, col, drop = TRUE])))
        col_order <- colnames(ximp_binary)
        obsY_binary <- ximp_binary[obsi, , drop = FALSE]

        OOB_MSE <- BS(RF$predictions[oob_i,col_order], obsY_binary[oob_i, , drop = FALSE]) /
          ncol(obsY_binary[oob_i, , drop = FALSE])
        OOB_NMSE <- BSnorm(RF$predictions[oob_i,col_order], obsY_binary[oob_i, , drop = FALSE])

        # save OOB error for class predictions
        OOB_preds <- RF$predictions[oob_i, col_order, drop = FALSE]
        OOB_levels <- colnames(OOB_preds)
        OOB_class_preds <- apply(OOB_preds, 1, function(x) OOB_levels[which.max(x)])

        OOB_MER <- mer(OOB_class_preds, obsY[oob_i])
        OOB_macro_F1 <- macro_F1(OOB_class_preds, obsY[oob_i]) # calculated only for multi-class
        OOB_F1 <- F1_score(OOB_class_preds, obsY[oob_i]) # calculated only for two-class

        OOB_err[OOB_err$iteration == iter & OOB_err$variable == col, "MSE"] <- OOB_MSE
        OOB_err[OOB_err$iteration == iter & OOB_err$variable == col, "NMSE"] <- OOB_NMSE
        OOB_err[OOB_err$iteration == iter & OOB_err$variable == col, "MER"] <- OOB_MER
        OOB_err[OOB_err$iteration == iter & OOB_err$variable == col, "macro_F1"] <- OOB_macro_F1
        OOB_err[OOB_err$iteration == iter & OOB_err$variable == col, "F1_score"] <- OOB_F1

        # apparent error
        apparent_preds <- predict(RF, obsX)$predictions
        apparent_NMSE <- BSnorm(apparent_preds[,col_order], obsY_binary[ , , drop = FALSE])
        apparent_err[apparent_err$iteration == iter &
                       apparent_err$variable == col, "NMSE"] <- apparent_NMSE

      }
    }

    if (verbose) cat("done!\n")

    # calculate convergence
    if (convergence_error == "OOB"){
      convergence <- calculate_convergence(err = OOB_err,
                                           weights = var_weights[impute_sequence])
    } else {
      convergence <- calculate_convergence(err = apparent_err,
                                           weights = var_weights[impute_sequence])
    }

    converged <- convergence$converged

    # if the maxiter is fixed, ignore the convergence criteria
    converged <- converged & !fixed_maxiter

    # return error monitoring
    if (verbose){
      delta_start <- proc.time() - t_start
      cat(sprintf("    OOB errors MSE:            %s\n",
                  paste(round(OOB_err[OOB_err$iteration == iter,"MSE"], 10), collapse = ", ")))
      cat(sprintf("    OOB errors NMSE:           %s\n",
                  paste(round(OOB_err[OOB_err$iteration == iter,"NMSE"], 10), collapse = ", ")))
      cat(sprintf("    diff. convergence measure: %s\n",
                  paste(round(convergence$measure_old - convergence$measure_new, 10), collapse = ", ")))
      cat(sprintf("    time:                      %s seconds\n\n",
                  round(delta_start[3], 10)))
    }

    iter <- iter + 1

  } # end while

  # return integer columns as integer
  if (return_integer_as_integer & length(integer_columns) > 0) {
    ximp[, integer_columns] <- lapply(ximp[, integer_columns, drop = FALSE],
                                      function(x) as.integer(round(x)))
    ximp_old[, integer_columns] <- lapply(ximp_old[, integer_columns, drop = FALSE],
                                          function(x) as.integer(round(x)))
  }

  # output
  if (iter != maxiter & converged) ximp <- ximp_old

  out <- list(ximp = ximp,
              init = var_init,
              initialization = initialization,
              impute_sequence = impute_sequence,
              maxiter = maxiter,
              save_models = save_models,
              models = models,
              return_integer_as_integer = return_integer_as_integer,
              integer_columns = integer_columns,
              predictor_matrix = predictor_matrix,
              OOB_err = OOB_err)

  class(out) <- 'missForest'
  return(out)
}
Any scripts or data that you put into this service are public.
missForestPredict documentation built on June 8, 2025, 10:20 a.m.
rdrr.io home R language documentation Run R code online
CRAN packages Bioconductor packages R-Forge packages GitHub packages
Note that we can't provide technical support on individual packages. You should contact the package authors for that.
missForestPredict
Missing Value Imputation using Random Forest for Prediction Settings

R/missForest.R
In missForestPredict: Missing Value Imputation using Random Forest for Prediction Settings

Defines functions missForest

Documented in missForest

Try the missForestPredict package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

missForestPredict Missing Value Imputation using Random Forest for Prediction Settings

R/missForest.R In missForestPredict: Missing Value Imputation using Random Forest for Prediction Settings

Defines functions missForest

Documented in missForest

Try the missForestPredict package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

missForestPredict
Missing Value Imputation using Random Forest for Prediction Settings

R/missForest.R
In missForestPredict: Missing Value Imputation using Random Forest for Prediction Settings