perform_missforest: Perform missForest iteration
In stephematician/miForang: Single or multiple imputation of missing data using random forests

Description Usage Arguments Details Value References See Also

Perform the missForest (Stekhoven and Buehlmann, 2012) iterative procedure to impute missing data using random forests. The ranger (Wright and Ziegler, 2017) fast implementation of random forest (training) algorithm is used. Some key alterations to the missForest algorithm may be specified by the user.

perform_missforest(X_init, model, indicator, ranger_call, gibbs = F,
  tree.imp = F, boot.train = F, obs.only = T,
  stop.measure = measure_correlation, loop.limit = 10L,
  overrides = list(), clean.step = list())

`X_init`	data.frame; a data set including any of numeric, logical, integer, factor and ordered data types, to be used as the initial state of the missForest procedure.
`model`	matrix; logical matrix which indicates inclusion of a predictor (named column) in the model of an imputed value (named row), with the order of imputation being the row order, default is a matrix of ones with rows for each partially but not-completely missing variable (in order of least to most missing), and columns for every partially complete variable.
`indicator`	named list; an indicator of the missing (`=T`) or not-missing (`=F`) status of the columns of `X_init`.
`ranger_call`	call; skeleton call to `ranger` for fitting random forests during the missForest iterative procedure, arguments can be over-ridden on a per-variable basis by `overrides`.
`gibbs`	logical; use Gibbs sampling in training steps (`T`) rather than the predictions from the previous iteration (default).
`tree.imp`	logical; use a prediction of missing data from single tree in the forest when training (`T`) rather than the bagged predicted value (default).
`boot.train`	logical; train each forest on a bootstrap sample of the observed data when `T`, rather than the observed data (default).
`obs.only`	logical; train on only observed outcomes (default) or use all data including predicted/sampled values of missing outcomes (`T`).
`stop.measure`	function; evaluates the difference or relationship between the two most recently completed data sets during iteration, must accept the following arguments; `X` named list with imputed values (in order of appearance by row) for each column in the data set; `Y` named list with imputed values (in order of appearance by row) for each column in the data set; `X_init` the original (mised-type) data set with missing values replaced as at the starting point of missForest; `indicator` a list with the missing (`=T`) or not missing (`=F`) status of the original data set; and should return a numeric (vector), the default `measure_correlation` serves as an example, or see the original measure proposed by Stekhoven and Buehlmann (2012) in `measure_stekhoven_2012`.
`loop.limit`	numeric; maximum number of iterations within missForest procedure.
`overrides`	named list; (variable-wise) over-rides for arguments passed to `ranger` when training on the response variable given by the name of the item.
`clean.step`	named list; each item is a function to clean or post-process the named imputed data immediately after it is imputed, taking two arguments; the subset of the data used in the current training step which had missing values of the named data, the most recently imputed values of the named data, and should return (post-processed) data of the same length and type as the second argument.

For a full description of the missForest algorithm, see Stekhoven and Buehlmann (2012). In brief, at each iteration missing values are imputed for each variable (in the order of rownames(model)) by the predictions of a random forest trained on the observed cases of that variable along with the completed data set of the previous iteration as the value of the predictors. This is repeated until some measure of the relationship between iterations indicates convergence - usually by decreasing from the measure at the previous iteration.

Numeric data is treated as continuous and predicted by regression forests while factors are predicted via classification forests. When called from smirf only numeric (non-integer) and factor and ordered data are present (integer and logical types having been converted to factors).

The key modifications to the procedure governed by the arguments

gibbs: use the most recent predictions for each variable in training and prediction as they become available, like a Gibbs sampler by setting this to T (default is F;
obs.only: train on all rows in the data set instead of observed only by setting this to F (default is T), and;
tree.imp: predict using a randomly selected tree for each missing value rather than use the whole-of-forest aggregated prediction by setting this to T (default is F).

Collectively, these three changes make the procedure similar to the Multiple Imputation via Chained Equations of van Buuren and Groothuis-Oudshoorn, (2012).

The convergence criterion can be modified by the stop.measure argument. The default is to measure the mean rank correlation between iterations of the ordered data and the stationary rate of the categorical data (see measure_correlation. The procedure halts when both of these values are less than or equal to the previous values (see stop_condition). The original Stekhoven and Buehlmann (2012) measure is provided by the measure_stekhoven_2012 function.

named list; results of the iterative procedure given as;

converged

logical; indicator of convergence;

oob_error

data.frame; variable-wise out-of-bag error at each iteration described by columns;

iteration: numeric.
variable: factor; name of column in data set.
measure: factor; one of mse (mean square error) for non-integer numeric data or pfc (proportion falsely classified).
value: numeric; out of bag error.

stop_measures

list; containing the value returned by stop.measure at each iteration.

imputed

list; each item is a named list of imputed values at each iteration, in order of appearance in X_init.

Stekhoven, D.J. and Buehlmann, P., 2012. MissForest–non-parametric missing value imputation for mixed-type data. Bioinformatics, 28(1), pp. 112-118. doi.1.1093/bioinformatics/btr597

Van Buuren, S. and Groothuis-Oudshoorn, K., 2011. mice: Multivariate Imputation by Chained Equations in R. _Journal of Statistical Software, 45_(3). pp. 1-67. doi.10.18637/jss.v045.i03

Wright, M. N. and Ziegler, A., 2017. ranger: A fast implementation of random forests for high dimensional data in C++ and R. Journal of Statistical Software, 77(i01), pp. 1-17. doi.10.18637/jss.v077.i01

measure_correlation measure_stekhoven_2012 missForest ranger stop_condition

stephematician/miForang documentation built on July 23, 2019, 5:11 p.m.

stephematician/miForang index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

stephematician/miForang
Single or multiple imputation of missing data using random forests

perform_missforest: Perform missForest iteration
In stephematician/miForang: Single or multiple imputation of missing data using random forests

Description

Usage

Arguments

Details

Value

References

See Also

Related to perform_missforest in stephematician/miForang...

R Package Documentation

Browse R Packages

We want your feedback!

stephematician/miForang Single or multiple imputation of missing data using random forests

perform_missforest: Perform missForest iteration In stephematician/miForang: Single or multiple imputation of missing data using random forests

Description

Usage

Arguments

Details

Value

References

See Also

Related to perform_missforest in stephematician/miForang...

R Package Documentation

Browse R Packages

We want your feedback!

stephematician/miForang
Single or multiple imputation of missing data using random forests

perform_missforest: Perform missForest iteration
In stephematician/miForang: Single or multiple imputation of missing data using random forests