impute_iterative: Iterative imputation

View source: R/impute_iterative.R

impute_iterativeR Documentation

Iterative imputation

Description

Iterative imputation of a data set

Usage

impute_iterative(
  ds,
  model_spec_parsnip = linear_reg(),
  model_fun_unsupervised = NULL,
  predict_fun_unsupervised = NULL,
  max_iter = 10,
  stop_fun = NULL,
  initial_imputation_fun = NULL,
  cols_used_for_imputation = "only_complete",
  cols_order = seq_len(ncol(ds)),
  rows_used_for_imputation = "only_complete",
  rows_order = seq_len(nrow(ds)),
  update_model = "every_iteration",
  update_ds_model = "every_iteration",
  stop_fun_args = NULL,
  M = is.na(ds),
  model_arg = NULL,
  warn_incomplete_imputation = TRUE,
  ...
)

Arguments

ds

The data set to be imputed. Must be a data frame with column names.

model_spec_parsnip

The model type used for supervised imputation (see (impute_supervised() for details).

model_fun_unsupervised

An unsupervised model function (see impute_unsupervised() for details).

predict_fun_unsupervised

A predict function for unsupervised imputation (see impute_unsupervised() for details).

max_iter

Maximum number of iterations

stop_fun

A stopping function (see details below) or NULL. If NULL, iterations are only stopped after max_iter is reached.

initial_imputation_fun

This function will do the initial imputation of the missing values. If NULL, no initial imputation is done. Some common choices like mean imputation are implemented in the package missMethods.

cols_used_for_imputation

Which columns should be used to impute other columns? Possible choices: "only_complete", "already_imputed", "all"

cols_order

Ordering of the columns for imputation. This can be a vector with indices or an order_option from order_cols().

rows_used_for_imputation

Which rows should be used to impute other rows? Possible choices: "only_complete", "partly_complete", "complete_in_k", "already_imputed", "all_except_i", "all"

rows_order

Ordering of the rows for imputation. This can be a vector with indices or an order_option from order_rows().

update_model

How often should the model for imputation be updated?

update_ds_model

How often should the data set for the inner model be updated?

stop_fun_args

Further arguments passed on to stop_fun.

M

Missing data indicator matrix

model_arg

Further arguments for model_fun_unsupervised (see impute_unsupervised() for details).

warn_incomplete_imputation

Should a warning be given, if the returned data set still contains NA?

...

Further arguments passed on to stats::predict() or predict_fun_unsupervised.

Details

This function impute a data set in an iterative way. Internally, either impute_supervised() or impute_unsupervised() is used, depending on the values of model_spec_parsnip, model_fun_unsupervised and predict_fun_unsupervised. If you want to use a supervised inner method, model_spec_parsnip must be specified and model_fun_unsupervised and predict_fun_unsupervised must both be NULL. For an unsupervised inner method, model_fun_unsupervised and predict_fun_unsupervised must be specified and model_spec_parsnip must be NULL. Some arguments of this function are only meaningful for impute_supervised() or impute_unsupervised().

Value

an imputed data set (or a return value of stop_fun)

stop_fun

The stop_fun should take the arguments

  • ds (the data set imputed in the current iteration)

  • ds_old (the data set imputed in the last iteration)

  • a list (with named elements M, nr_iterations, max_iter)

  • stop_fun_args

  • res_stop_fun (the return value of stop_fun from the last iteration. Initial value for the first iteration: list(stop_iter = FALSE)) in this order.

To allow for a next iteration, the stop_fun must return a list which contains the named element stop_iter = FALSE. The simple return list(stop_iter = FALSE) will allow the iteration to continue. However, the list can include more information which are handed over to stop_fun in the next iteration. For example, the return value list(stop_iter = FALSE, last_eps = 0.3) would also lead to another iteration. If stop_fun does not return a list or the list does not contain stop_iter = FALSE the iteration is stopped and the return value of stop_fun is returned as result of impute_iterative(). Therefore, this return value should normally include the imputed data set ds or ds_old.

An example for a stop_fun is stop_ds_difference().

See Also

  • impute_supervised() and impute_unsupervised() as the workhorses for the imputation.

  • stop_ds_difference() as an example of a stop function.

Examples

set.seed(123)
# simple example
ds_mis <- missMethods::delete_MCAR(
  data.frame(X = rnorm(20), Y = rnorm(20)), 0.2, 1
)
impute_iterative(ds_mis, max_iter = 2)
# using pre-imputation
ds_mis <- missMethods::delete_MCAR(
  data.frame(X = rnorm(20), Y = rnorm(20)), 0.2
)
impute_iterative(
  ds_mis,
  max_iter = 2, initial_imputation_fun = missMethods::impute_mean
)
# example using stop_ds_difference() as stop_fun
ds_mis <- missMethods::delete_MCAR(
  data.frame(X = rnorm(20), Y = rnorm(20)), 0.2
)
ds_imp <- impute_iterative(
  ds_mis,
  initial_imputation_fun = missMethods::impute_mean,
  stop_fun = stop_ds_difference, stop_fun_args = list(eps = 0.5)
)
attr(ds_imp, "nr_iterations")

imputeGeneric documentation built on March 18, 2022, 6:35 p.m.