knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
The goal of imputeGeneric is to ease the implementation of imputation functions.
You can install the development version of imputeGeneric from GitHub with:
# install.packages("devtools") devtools::install_github("torockel/imputeGeneric")
The aim of imputeGeneric is to make the implementation and usage of imputation
methods easier. The main function of the package is impute_iterative()
. This
function can turn any parsnip
model into an imputation method. Furthermore, other customized approaches can be
used in a general imputation framework. For more information, see the
documentations of impute_iterative()
, impute_supervised()
,
impute_unsupervised()
and the following examples.
The use of a parsnip model for imputation is demonstrated using regression trees
from the rpart package via parsnip (decision_tree("regression")
). First, a
data set with missing values is created. Then, this data set is imputed once
with regression trees using only completely observed rows and columns for the
model building.
``` {r example1} library(imputeGeneric) library(parsnip)
set.seed(123) ds_mis <- data.frame(X = rnorm(100), Y = rnorm(100)) ds_mis$Z <- 5 + 2* ds_mis$X + ds_mis$Y + rnorm(100) ds_mis$Z[sample.int(100, 30)] <- NA ds_mis$Y[sample.int(100, 20)] <- NA
ds_imp <- impute_iterative(ds_mis, decision_tree("regression"), max_iter = 1) anyNA(ds_imp)
To use other parsnip models instead of regression trees, only the `model_spec_parsnip` argument must be altered. E.g. for linear regression instead of regression trees use `linear_reg()`. ``` {r example1_lm} ds_imp_lm <- impute_iterative(ds_mis, linear_reg(), max_iter = 1) anyNA(ds_imp_lm)
Many aspects of the imputation can be specified and customized. The missing
values can be initially imputed e.g. with per column mean values
(initial_imputation_fun = missMethods::impute_mean
). In addition, all objects
and columns can be used for the imputation models (rows_used_for_imputation =
"all"
and cols_used_for_imputation = "all"
). Furthermore, the imputation can
be iterative. The iteration will be stopped, if either the difference between
two imputed data sets falls below a threshold (stop_fun = stop_ds_difference,
stop_fun_args = list(eps = 0.1)
) or the maximum number of iterations
(max_iter = 5
) is reached.
{r example2}
ds_imp2 <- impute_iterative(
ds_mis, decision_tree("regression"),
initial_imputation_fun = missMethods::impute_mean,
cols_used_for_imputation = "all",
rows_used_for_imputation = "all",
stop_fun = stop_ds_difference,
stop_fun_args = list(eps = 0.1),
max_iter = 5)
anyNA(ds_imp2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.