future_mice | R Documentation |
'mice::mice()'
Using {future}
future_mice()
parallelizes chains in Multivariate Imputation using Chained
Equations (MICE) using the {furrr}
package to create
future
s for chains. Chains are also assessed for
convergence using the R-hat (potential scale reduction factor) statistic; if
the largest R-hat is less than rhat_max
for minit
iterations, the
function returns early (without completing maxit
iterations). This can save
a significant amount of computation and manual convergence checking, and it
often works well in practice. However, a "good" R-hat is neither a necessary
nor sufficient condition for MCMC convergence, nor is it a substitute for
checking imputation quality once convergence is achieved.
future_mice(
data,
m = 5L,
method = NULL,
predictorMatrix = NULL,
ignore = NULL,
where = NULL,
blocks = NULL,
visitSequence = NULL,
formulas = NULL,
blots = NULL,
post = NULL,
defaultMethod = c("pmm", "logreg", "polyreg", "polr"),
maxit = 100L,
minit = min(5L, maxit),
quiet = FALSE,
seed = NA,
data.init = NULL,
chunk_size = 1L,
rhat_max = 1.05,
progressor = NULL,
...
)
data |
A data frame or a matrix containing the incomplete data. Missing
values are coded as |
m |
Number of multiple imputations. The default is |
method |
Can be either a single string, or a vector of strings with
length |
predictorMatrix |
A numeric matrix of |
ignore |
A logical vector of |
where |
A data frame or matrix with logicals of the same dimensions
as |
blocks |
List of vectors with variable names per block. List elements
may be named to identify blocks. Variables within a block are
imputed by a multivariate imputation method
(see |
visitSequence |
A vector of block names of arbitrary length, specifying the
sequence of blocks that are imputed during one iteration of the Gibbs
sampler. A block is a collection of variables. All variables that are
members of the same block are imputed
when the block is visited. A variable that is a member of multiple blocks
is re-imputed within the same iteration.
The default |
formulas |
A named list of formula's, or expressions that
can be converted into formula's by |
blots |
A named |
post |
A vector of strings with length |
defaultMethod |
A vector of length 4 containing the default
imputation methods for 1) numeric data, 2) factor data with 2 levels, 3)
factor data with > 2 unordered levels, and 4) factor data with > 2
ordered levels. By default, the method uses
|
maxit |
A scalar giving the maximum number of iterations.
|
minit |
The minimum number of iterations to run. This is also the number
of iterations used to assess convergence. Convergence is defined as
|
quiet |
Should convergence messages and warning be suppressed? |
seed |
Seed for random number generation; either a scalar |
data.init |
A data frame of the same size and type as |
chunk_size |
The average number of chains per future. Differs from the
usual |
rhat_max |
The R-hat threshold used to assess convergence.
Convergence is defined as |
progressor |
An optional |
... |
Arguments passed on to
|
MICE is a method for creating multiple imputations (replacement values) for
multivariate missing data. The method is based on Fully Conditional
Specification (FCS), where each incomplete varaible is imputed by a separate
model. The MICE algorithm can impute mixes of continuous, binary, unordered
categorical and ordered categorical data. In addition, MICE can impute
continuous two-level data and maintain consistency between imputations by
means of passive imputation and post-processing. Many diagnostic plots are
implemented to inspect the quality of the imputations. See the
mice::mice()
function and the vignettes on the
{mice}
package website for details.
future_mice()
mimics the mice::mice()
interface as closely as possible;
however, some shared parameters have different defaults than their {mice}
equivalents. Notably, the default maxit
is much larger than in {mice}
;
this is because maxit
is an upper bound in future_mice()
, rather than an
exact number of iterations, as in mice()
. The default of 100
should be
more than enough iterations for most problems; if you need more than 100
iterations for convergence, you may want to check your imputation model for
circularity or other stability issues.
Additionally, future_mice()
provides NULL
defaults for all unset
arguments; this is a best practice in R
. Because of this, passing NULL
to any argument without an explicit default is the same as not passing that
argument, which differs from the behavior of mice()
in some instances.
Finally, some output attributes are not identical to their equivalents in
mice()
. In particular, the call
attribute contains the call to
future_mice()
, rather than a call to mice()
. The lastSeedValue
should
be equivalent, but does not function identically in subsequent calls to
mice.mids()
and future_mids()
.
Returns an S3 object of class mids
(multiply imputed data set)
# Run imputations in parallel (just two to avoid hogging resources)
# Picking a number of workers that divides `m` evenly can help performance
future::plan("multisession", workers = pmin(2L, future::availableCores()))
# Use just like `mice::mice()` - examples from {mice} documentation
mids <- future_mice(mice::nhanes, m = 2L, maxit = 1L)
## Not run:
# Run until convergence (`maxit = 100L` by default)
mids <- future_mice(mice::nhanes, m = 2L)
## End(Not run)
mids
# List the actual imputations for BMI
mids$imp$bmi
# First completed data matrix
mice::complete(mids)
# Reset future plan
future::plan("sequential")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.