missRanger | R Documentation |
Uses the "ranger" package (Wright & Ziegler) to do fast missing value imputation by
chained random forests, see Stekhoven & Buehlmann and Van Buuren & Groothuis-Oudshoorn.
Between the iterative model fitting, it offers the option of predictive mean matching.
This firstly avoids imputation with values not present in the original data
(like a value 0.3334 in a 0-1 coded variable).
Secondly, predictive mean matching tries to raise the variance in the resulting
conditional distributions to a realistic level. This allows to do multiple imputation
when repeating the call to missRanger()
.
missRanger(
data,
formula = . ~ .,
pmm.k = 0L,
num.trees = 500,
mtry = NULL,
min.node.size = NULL,
min.bucket = NULL,
max.depth = NULL,
replace = TRUE,
sample.fraction = if (replace) 1 else 0.632,
case.weights = NULL,
num.threads = NULL,
save.memory = FALSE,
maxiter = 10L,
seed = NULL,
verbose = 1,
returnOOB = FALSE,
data_only = !keep_forests,
keep_forests = FALSE,
...
)
data |
A |
formula |
A two-sided formula specifying variables to be imputed
(left hand side) and variables used to impute (right hand side).
Defaults to |
pmm.k |
Number of candidate non-missing values to sample from in the predictive mean matching steps. 0 to avoid this step. |
num.trees |
Number of trees passed to |
mtry |
Number of covariates considered per split. The default |
min.node.size |
Minimal node size passed to |
min.bucket |
Minimal terminal node size passed to |
max.depth |
Maximal tree depth passed to |
replace |
Sample with replacement passed to |
sample.fraction |
Fraction of rows per tree passed to |
case.weights |
Optional case weights passed to |
num.threads |
Number of threads passed to |
save.memory |
Slow but memory saving mode of |
maxiter |
Maximum number of iterations. |
seed |
Integer seed. |
verbose |
A value in 0, 1, 2 controlling the verbosity. |
returnOOB |
Should the final average OOB prediction errors be added
as data attribute "oob"? Only relevant when |
data_only |
If |
keep_forests |
Should the random forests of the last relevant iteration
be returned? The default is |
... |
Additional arguments passed to |
The iterative chaining stops as soon as maxiter
is reached or if the average
out-of-bag (OOB) prediction errors stop reducing.
In the latter case, except for the first iteration, the second last (= best)
imputed data is returned.
OOB prediction errors are quantified as 1 - R^2 for numeric variables, and as classification error otherwise. If a variable has been imputed only univariately, the value is 1.
If data_only = TRUE
an imputed data.frame
. Otherwise, a "missRanger" object
with the following elements:
data
: The imputed data.
data_raw
: The original data provided.
forests
: When keep_forests = TRUE
, a list of "ranger" models used to
generate the imputed data. NULL
otherwise.
to_impute
: Variables to be imputed (in this order).
impute_by
: Variables used for imputation.
best_iter
: Best iteration.
pred_errors
: Per-iteration OOB prediction errors (1 - R^2 for regression,
classification error otherwise).
mean_pred_errors
: Per-iteration averages of OOB prediction errors.
pmm.k
: Same as input pmm.k
.
Wright, M. N. & Ziegler, A. (2016). ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R. Journal of Statistical Software, in press. <arxiv.org/abs/1508.04409>.
Stekhoven, D.J. and Buehlmann, P. (2012). 'MissForest - nonparametric missing value imputation for mixed-type data', Bioinformatics, 28(1) 2012, 112-118. https://doi.org/10.1093/bioinformatics/btr597.
Van Buuren, S., Groothuis-Oudshoorn, K. (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3), 1-67. http://www.jstatsoft.org/v45/i03/
iris2 <- generateNA(iris, seed = 1)
imp1 <- missRanger(iris2, pmm.k = 5, num.trees = 50, seed = 1)
head(imp1)
# Extended output
imp2 <- missRanger(iris2, pmm.k = 5, num.trees = 50, data_only = FALSE, seed = 1)
summary(imp2)
all.equal(imp1, imp2$data)
# Formula interface: Univariate imputation of Species and Sepal.Width
imp3 <- missRanger(iris2, Species + Sepal.Width ~ 1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.