View source: R/impute_unsupervised.R
impute_unsupervised | R Documentation |
Impute a data set with an unsupervised inner method. This function is one
main function which can be used inside of impute_iterative()
. If you need
pre-imputation or iterations, directly use impute_iterative()
.
impute_unsupervised( ds, model_fun, predict_fun, rows_used_for_imputation = "only_complete", rows_order = seq_len(nrow(ds)), update_model = "every_iteration", update_ds_model = "every_iteration", model_arg = NULL, M = is.na(ds), ... )
ds |
The data set to be imputed. Must be a data frame with column names. |
model_fun |
An unsupervised model function which take as arguments
|
predict_fun |
A predict function which uses the via |
rows_used_for_imputation |
Which rows should be used to impute other rows? Possible choices: "only_complete", "already_imputed", "all_except_i", "all" |
rows_order |
Ordering of the rows for imputation. This can be a vector
with indices or an |
update_model |
How often should the model for imputation be updated? Possible choices are: "everytime" (after every imputed value) and "every_iteration" (only one model is created and used for all missing values). |
update_ds_model |
How often should the data set for the inner model be updated? Possible choices are: "everytime" (after every imputed value), and "every_iteration". |
model_arg |
Further arguments for |
M |
Missing data indicator matrix |
... |
Further arguments given to |
This function imputes the rows of the data set ds
row by
row. The imputation order of the rows can be specified by rows_order
.
Furthermore, rows_used_for_imputation
controls which rows are used for
the imputation. If ds
is pre-imputed, the missing data indicator matrix
can be supplied via M
.
The inner method used to impute the data set can be defined with model_fun
.
This model_fun
must take a data set, the missing data indicator matrix M
,
the index i
of the row which should be imputed right now (which is NULL
,
if the model is updated only once per iteration or only uses complete rows)
and model_arg
in this order. It must return a model model_imp
which is
given to predict_fun
to generate imputation values for the missing values
in a row i
. The model_fun
and predict_fun
can be self-written or a
predefined one (see below) can be used.
If update_model = "every_iteration"
only one model is fitted and the
argument update_ds_model
is ignored. This option can be considerably
faster than update_model = "everytime"
, especially, for data sets with
many rows with missing values. However, some methods (like nearest
neighbors) need update_model = "everytime"
.
The imputed data set.
model_donor()
and predict_donor()
for a pair of predefined
functions for model_fun
and predict_fun
.
ds_mis <- missMethods::delete_MCAR( data.frame(X = rnorm(20), Y = rnorm(20)), 0.2, 1 ) impute_unsupervised(ds_mis, model_donor, predict_donor) # knn imputation with k = 2 impute_unsupervised(ds_mis, model_donor, predict_donor, update_model = "everytime", model_arg = list(k = 2) )
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.