In jesse-smith/futuremice: A Future for MICE

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

futuremice

{futuremice} parallelizes the main functionality of the {mice} package using {future} and {furrr}. This enables the use of a progress bar for updates, as well as an early stopping method to save time spent on unneeded iteration or manual convergence checks (not quality checks - you still have to assess the results yourself).

Installation

You can install the development version of futuremice like so:

# You will need Rtools to install packages from Github on Windows
# `devtools` with throw an informative error if Rtools is not found
if (!"devtools" %in% installed.packages()) install.packages("devtools")
devtools::install_github("jesse-smith/futuremice")

Minimal Example

Let's run the example from the {mice} package documentation, but in parallel.

# Load {futuremice}
library(futuremice)

# Use a local seed
withr::local_seed(1L)

# Evaluate futures in parallel - max of two workers to avoid hogging resources
future::plan("multisession", workers = pmin(2L, future::availableCores()))

# Use {progress} package for progress bar - shows diagnostics in real time
progressr::handlers("progress")

{futuremice} uses the {future} package to run imputations in parallel. By default, {future} will run a "sequential" plan, which is no different (and a little less efficient) than calling mice::mice(). To take advantage of multiple CPUs, we can use a "multisession" plan (see the vignette from the {future} package for details on different plans). future_mice() also provides a progress bar and real-time convergence diagnostics using {progressr}; however, the default progress bar does not show messages, so we'll use the progress handler to see our diagnostics.

Now, let's impute our missing data:

# Impute the missing values using defaults
# Use `progressr::with_progress()` to show the progress bar
mids <- progressr::with_progress(future_mice(mice::nhanes))

# Or start with `mice::mice()` and finish with `future_mids()`
mids2 <- mice::mice(mice::nhanes, maxit = 1L, printFlag = FALSE)
mids2 <- progressr::with_progress(future_mids(mids2, maxit = 100L))

# View the resulting `mids` (*m*ultiply *i*mputed *d*ata *s*et) object
mids

# List the actual imputations for BMI
mids$imp$bmi

Note that future_mice() will often run longer than mice::mice()'s default of 5 imputations before convergence is confidently achieved. Also note that we will only get a progress bar if we wrap the call in with_progress(); this is a feature of the {progressr} package.

We can use the resulting mids object just like the result of a call to mice::mice(). Let's inspect the quality of the imputations:

# Inspect quality of imputations
mice::stripplot(mids, chl, pch = 19, xlab = "Imputation number")

In general, we would like the imputations to be plausible, i.e., values that could have been observed if they had not been missing. Now let's fit a model to the imputed data set and pool the results:

# Fit complete-data model
fit <- with(mids, lm(chl ~ age + bmi))

# Pool and summarize the results
summary(mice::pool(fit))

The complete-data model is fit to each imputed data set, and the results are combined to arrive at estimates that properly account for the missing data.

We can also compare two mids objects using compare_mids():

compare_mids(mids, mids2, ignore_rng = TRUE)

This will show us where differences occur between the two objects (if there are any). We'll ignore attributes that depend on the RNG state because evaluating imputations in parallel requires a different kind of random number generation than evaluating sequentially, as we did in the first iteration of mice::mice().

future::plan("sequential")

Code of Conduct

Please note that the futuremice project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

jesse-smith/futuremice documentation built on Nov. 24, 2023, 7:19 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com