knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
{futuremice}
parallelizes the main functionality of the {mice}
package using
{future}
and {furrr}
. This enables the use of a progress bar for updates, as
well as an early stopping method to save time spent on unneeded iteration or
manual convergence checks (not quality checks - you still have to assess the
results yourself).
You can install the development version of futuremice like so:
# You will need Rtools to install packages from Github on Windows # `devtools` with throw an informative error if Rtools is not found if (!"devtools" %in% installed.packages()) install.packages("devtools") devtools::install_github("jesse-smith/futuremice")
Let's run the example from the {mice}
package documentation, but in parallel.
# Load {futuremice} library(futuremice) # Use a local seed withr::local_seed(1L) # Evaluate futures in parallel - max of two workers to avoid hogging resources future::plan("multisession", workers = pmin(2L, future::availableCores())) # Use {progress} package for progress bar - shows diagnostics in real time progressr::handlers("progress")
{futuremice}
uses the {future}
package to run imputations in parallel. By
default, {future}
will run a "sequential"
plan, which is no different
(and a little less efficient) than calling mice::mice()
. To take advantage of
multiple CPUs, we can use a "multisession"
plan (see the vignette
from the {future}
package for details on different plans). future_mice()
also
provides a progress bar and real-time convergence diagnostics using {progressr}
;
however, the default progress bar does not show messages, so we'll use the progress
handler to see our diagnostics.
Now, let's impute our missing data:
# Impute the missing values using defaults # Use `progressr::with_progress()` to show the progress bar mids <- progressr::with_progress(future_mice(mice::nhanes)) # Or start with `mice::mice()` and finish with `future_mids()` mids2 <- mice::mice(mice::nhanes, maxit = 1L, printFlag = FALSE) mids2 <- progressr::with_progress(future_mids(mids2, maxit = 100L)) # View the resulting `mids` (*m*ultiply *i*mputed *d*ata *s*et) object mids # List the actual imputations for BMI mids$imp$bmi
Note that future_mice()
will often run longer than mice::mice()
's default of
5
imputations before convergence is confidently achieved. Also note that we
will only get a progress bar if we wrap the call in with_progress()
; this is
a feature of the {progressr}
package.
We can use the resulting mids
object just like the result of a call to
mice::mice()
. Let's inspect the quality of the imputations:
# Inspect quality of imputations mice::stripplot(mids, chl, pch = 19, xlab = "Imputation number")
In general, we would like the imputations to be plausible, i.e., values that could have been observed if they had not been missing. Now let's fit a model to the imputed data set and pool the results:
# Fit complete-data model fit <- with(mids, lm(chl ~ age + bmi)) # Pool and summarize the results summary(mice::pool(fit))
The complete-data model is fit to each imputed data set, and the results are combined to arrive at estimates that properly account for the missing data.
We can also compare two mids
objects using compare_mids()
:
compare_mids(mids, mids2, ignore_rng = TRUE)
This will show us where differences occur between the two objects
(if there are any). We'll ignore attributes that depend on the RNG state because
evaluating imputations in parallel requires a different kind of random number
generation than evaluating sequentially, as we did in the first iteration of
mice::mice()
.
future::plan("sequential")
Please note that the futuremice project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.