R/prop_usable_cases.R
In missForestPredict: Missing Value Imputation using Random Forest for Prediction Settings

Documented in prop_usable_cases

#' Calculates variable-wise proportion of usable cases (missing and observed)
#'
#' Calculates variable-wise proportion of usable cases (missing and observed) as in Molenberghs et al. (2014).
#'
#' missForest builds models for each variable using the observed values of that variable as outcome
#' of a random forest model. It then imputes the missing part of the variable using the learned models.
#'
#' If all values of a predictor are missing among the observed value of the outcome,
#' the value of \code{p_obs} will be 1 and the model built will rely heavily on the initialized values.
#' If all values of a predictor are observed among the observed values of the outcome, \code{p_obs} will be 0
#' and the model will rely on observed values. Low values of \code{p_obs} are preferred.
#'
#' Similarly, if all values of a predictor are missing among the missing values of the outcome,
#' \code{p_miss} will have a value of 0 and the imputations (predictions) will heavily rely on the initialized values.
#' If all values of a predictor are observed among the missing value of the outcome, \code{p_miss} will have a value of 1
#' and the imputations (predictions) will rely on real values. High values of \code{p_miss} are preferred.
#'
#' Each row represents a variable to be imputed and each column a predictor.
#'
#' @param data dataframe to be imputed
#'
#' @return a list with two elements: \code{p_obs} and \code{p_miss}
#'     \item{\code{p_obs}}{the proportion of missing \eqn{Y_k} among observed \eqn{Y_j}; j in rows, k in columns}
#'     \item{\code{p_miss}}{the proportion of observed \eqn{Y_k} among missing \eqn{Y_j}; j in rows, k in columns}
#' @references
#' \itemize{
#'     \item Molenberghs, G., Fitzmaurice, G., Kenward, M. G., Tsiatis, A., & Verbeke, G. (Eds.). (2014). Handbook of missing data methodology. CRC Press. Chapter "Multiple Imputation"
#'   }
#' @export

prop_usable_cases <- function(data){
  NA_loc <- !is.na(data) # observed indicator

  rr <- t(NA_loc) %*% NA_loc
  mm <- t(!NA_loc) %*% !NA_loc
  mr <- t(NA_loc) %*% !NA_loc
  rm <- t(!NA_loc) %*% NA_loc

  p_miss <- t(mr / (mr + mm)) # transpose to keep in rows the var to be imputed
  p_obs <- t(rm / (rm + rr))


  list(p_obs = p_obs, p_miss = p_miss)
}

Any scripts or data that you put into this service are public.

missForestPredict documentation built on June 8, 2025, 10:20 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

missForestPredict
Missing Value Imputation using Random Forest for Prediction Settings

R/prop_usable_cases.R
In missForestPredict: Missing Value Imputation using Random Forest for Prediction Settings

Defines functions prop_usable_cases

Documented in prop_usable_cases

Try the missForestPredict package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

missForestPredict Missing Value Imputation using Random Forest for Prediction Settings

R/prop_usable_cases.R In missForestPredict: Missing Value Imputation using Random Forest for Prediction Settings

Defines functions prop_usable_cases

Documented in prop_usable_cases

Try the missForestPredict package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

missForestPredict
Missing Value Imputation using Random Forest for Prediction Settings

R/prop_usable_cases.R
In missForestPredict: Missing Value Imputation using Random Forest for Prediction Settings