View source: R/impute_unsupervised.R
| impute_unsupervised | R Documentation |
Impute a data set with an unsupervised inner method. This function is one
main function which can be used inside of impute_iterative(). If you need
pre-imputation or iterations, directly use impute_iterative().
impute_unsupervised( ds, model_fun, predict_fun, rows_used_for_imputation = "only_complete", rows_order = seq_len(nrow(ds)), update_model = "every_iteration", update_ds_model = "every_iteration", model_arg = NULL, M = is.na(ds), ... )
ds |
The data set to be imputed. Must be a data frame with column names. |
model_fun |
An unsupervised model function which take as arguments
|
predict_fun |
A predict function which uses the via |
rows_used_for_imputation |
Which rows should be used to impute other rows? Possible choices: "only_complete", "already_imputed", "all_except_i", "all" |
rows_order |
Ordering of the rows for imputation. This can be a vector
with indices or an |
update_model |
How often should the model for imputation be updated? Possible choices are: "everytime" (after every imputed value) and "every_iteration" (only one model is created and used for all missing values). |
update_ds_model |
How often should the data set for the inner model be updated? Possible choices are: "everytime" (after every imputed value), and "every_iteration". |
model_arg |
Further arguments for |
M |
Missing data indicator matrix |
... |
Further arguments given to |
This function imputes the rows of the data set ds row by
row. The imputation order of the rows can be specified by rows_order.
Furthermore, rows_used_for_imputation controls which rows are used for
the imputation. If ds is pre-imputed, the missing data indicator matrix
can be supplied via M.
The inner method used to impute the data set can be defined with model_fun.
This model_fun must take a data set, the missing data indicator matrix M,
the index i of the row which should be imputed right now (which is NULL,
if the model is updated only once per iteration or only uses complete rows)
and model_arg in this order. It must return a model model_imp which is
given to predict_fun to generate imputation values for the missing values
in a row i. The model_fun and predict_fun can be self-written or a
predefined one (see below) can be used.
If update_model = "every_iteration" only one model is fitted and the
argument update_ds_model is ignored. This option can be considerably
faster than update_model = "everytime", especially, for data sets with
many rows with missing values. However, some methods (like nearest
neighbors) need update_model = "everytime".
The imputed data set.
model_donor() and predict_donor() for a pair of predefined
functions for model_fun and predict_fun.
ds_mis <- missMethods::delete_MCAR( data.frame(X = rnorm(20), Y = rnorm(20)), 0.2, 1 ) impute_unsupervised(ds_mis, model_donor, predict_donor) # knn imputation with k = 2 impute_unsupervised(ds_mis, model_donor, predict_donor, update_model = "everytime", model_arg = list(k = 2) )
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.