Models that simultaneously optimize imptuation of multiple variables. Methods include imputation based on EM-estimation of multivariate normal parameters, imputation based on iterative Random Forest estimates and stochastic imptuation based on bootstrapped EM-estimatin of multivariate normal parameters.
impute_em(dat, formula, verbose = 0, ...) impute_mf(dat, formula, ...)
Options passed to
Formulas are of the form
[IMPUTED_VARIABLES] ~ MODEL_SPECIFICATION [ | GROUPING_VARIABLES ]
IMPUTED_VARIABLES is empty, every variable in
MODEL_SPECIFICATION will be imputed. When
specified, all variables in
MODEL_SPECIFICATION are part of the model, but only the
IMPUTED_VARIABLES are imputed in the output.
GROUPING_VARIABLES specify what categorical variables are used to
split-impute-combine the data. Grouping using
dplyr::group_by is also
supported. If groups are defined in both the formula and using
dplyr::group_by, the data is grouped by the union of grouping
variables. Any missing value in one of the grouping variables results in an
EM-based imputation with
impute_em only works for numerical
variables. These variables are assumed to follow a multivariate normal distribution
for which the means and covariance matrix is estimated based on the EM-algorithm
of Dempster Laird and Rubin (1977). The imputations are the expected values
for missing values, conditional on the value of the estimated parameters.
Multivariate Random Forest imputation with
impute_mf works for
numerical, categorical or mixed data types. It is based on the algorithm
of Stekhoven and Buehlman (2012). Missing values are imputed using a
rough guess after which a predictive random forest is trained and used
to re-impute themissing values. This is iterated until convergence.
Dempster, Arthur P., Nan M. Laird, and Donald B. Rubin. "Maximum likelihood from incomplete data via the EM algorithm." Journal of the royal statistical society. Series B (methodological) (1977): 1-38.
Stekhoven, D.J. and Buehlmann, P., 2012. MissForest—non-parametric missing value imputation for mixed-type data. Bioinformatics, 28(1), pp.112-118.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.