impute_unsupervised: Unsupervised imputation

View source: R/impute_unsupervised.R

impute_unsupervisedR Documentation

Unsupervised imputation

Description

Impute a data set with an unsupervised inner method. This function is one main function which can be used inside of impute_iterative(). If you need pre-imputation or iterations, directly use impute_iterative().

Usage

impute_unsupervised(
  ds,
  model_fun,
  predict_fun,
  rows_used_for_imputation = "only_complete",
  rows_order = seq_len(nrow(ds)),
  update_model = "every_iteration",
  update_ds_model = "every_iteration",
  model_arg = NULL,
  M = is.na(ds),
  ...
)

Arguments

ds

The data set to be imputed. Must be a data frame with column names.

model_fun

An unsupervised model function which take as arguments ds_used (the data set used to build the model, specified via rows_used_for_imputation), M and i (the index of the row currently under imputation).

predict_fun

A predict function which uses the via model_fun generated model (model_imp) to predict the missing values of a row. It should take the arguments model_imp, ds_used, M and i.

rows_used_for_imputation

Which rows should be used to impute other rows? Possible choices: "only_complete", "already_imputed", "all_except_i", "all"

rows_order

Ordering of the rows for imputation. This can be a vector with indices or an order_option from order_rows().

update_model

How often should the model for imputation be updated? Possible choices are: "everytime" (after every imputed value) and "every_iteration" (only one model is created and used for all missing values).

update_ds_model

How often should the data set for the inner model be updated? Possible choices are: "everytime" (after every imputed value), and "every_iteration".

model_arg

Further arguments for model_fun. This can be a list, if it is more than one argument.

M

Missing data indicator matrix

...

Further arguments given to predict_fun.

Details

This function imputes the rows of the data set ds row by row. The imputation order of the rows can be specified by rows_order. Furthermore, rows_used_for_imputation controls which rows are used for the imputation. If ds is pre-imputed, the missing data indicator matrix can be supplied via M.

The inner method used to impute the data set can be defined with model_fun. This model_fun must take a data set, the missing data indicator matrix M, the index i of the row which should be imputed right now (which is NULL, if the model is updated only once per iteration or only uses complete rows) and model_arg in this order. It must return a model model_imp which is given to predict_fun to generate imputation values for the missing values in a row i. The model_fun and predict_fun can be self-written or a predefined one (see below) can be used.

If update_model = "every_iteration" only one model is fitted and the argument update_ds_model is ignored. This option can be considerably faster than update_model = "everytime", especially, for data sets with many rows with missing values. However, some methods (like nearest neighbors) need update_model = "everytime".

Value

The imputed data set.

See Also

model_donor() and predict_donor() for a pair of predefined functions for model_fun and predict_fun.

Examples

ds_mis <- missMethods::delete_MCAR(
  data.frame(X = rnorm(20), Y = rnorm(20)), 0.2, 1
)
impute_unsupervised(ds_mis, model_donor, predict_donor)
# knn imputation with k = 2
impute_unsupervised(ds_mis, model_donor, predict_donor,
  update_model = "everytime", model_arg = list(k = 2)
)

imputeGeneric documentation built on March 18, 2022, 6:35 p.m.