complete: Impute missing values using imputation model
In rMIDAS: Multiple Imputation with Denoising Autoencoders

complete

R Documentation

Impute missing values using imputation model

Description

Having trained an imputation model, complete() produces m completed datasets, saved as a list.

Usage

complete(
  mid_obj,
  m = 10L,
  unscale = TRUE,
  bin_label = TRUE,
  cat_coalesce = TRUE,
  fast = FALSE,
  file = NULL,
  file_root = NULL
)

Arguments

`mid_obj`	Object of class `midas`, the result of running `rMIDAS::train()`
`m`	An integer, the number of completed datasets required
`unscale`	Boolean, indicating whether to unscale any columns that were previously minmax scaled between 0 and 1
`bin_label`	Boolean, indicating whether to add back labels for binary columns
`cat_coalesce`	Boolean, indicating whether to decode the one-hot encoded categorical variables
`fast`	Boolean, indicating whether to impute category with highest predicted probability (TRUE), or to use predicted probabilities to make weighted sample of category levels (FALSE)
`file`	Path to save completed datasets. If `NULL`, completed datasets are only loaded into memory.
`file_root`	A character string, used as the root for all filenames when saving completed datasets if a `filepath` is supplied. If no file_root is provided, completed datasets will be saved as "file/midas_impute_yymmdd_hhmmss_m.csv"

Details

For more information, see Lall and Robinson (2023): doi:10.18637/jss.v107.i09.

Value

List of length m, each element of which is a completed data.frame (i.e. no missing values)

References

\insertRef

rmidas_jssrMIDAS

Examples

# Generate raw data, with numeric, binary, and categorical variables
## Not run: 
# Run where Python available and configured correctly
if (python_configured()) {
set.seed(89)
n_obs <- 10000
raw_data <- data.table(a = sample(c("red","yellow","blue",NA),n_obs, replace = TRUE),
                       b = 1:n_obs,
                       c = sample(c("YES","NO",NA),n_obs,replace=TRUE),
                       d = runif(n_obs,1,10),
                       e = sample(c("YES","NO"), n_obs, replace = TRUE),
                       f = sample(c("male","female","trans","other",NA), n_obs, replace = TRUE))

# Names of bin./cat. variables
test_bin <- c("c","e")
test_cat <- c("a","f")

# Pre-process data
test_data <- convert(raw_data,
                     bin_cols = test_bin,
                     cat_cols = test_cat,
                     minmax_scale = TRUE)

# Run imputations
test_imp <- train(test_data)

# Generate datasets
complete_datasets <- complete(test_imp, m = 5, fast = FALSE)

# Use Rubin's rules to combine m regression models
midas_pool <- combine(formula = d~a+c+e+f,
                      complete_datasets)
}

## End(Not run)

rMIDAS documentation built on Oct. 11, 2023, 5:14 p.m.