convert: Pre-process data for Midas imputation

View source: R/pre_processing.R

convertR Documentation

Pre-process data for Midas imputation

Description

convert pre-processes datasets to enable user-friendly interface with the main train() function.

Usage

convert(data, bin_cols = NULL, cat_cols = NULL, minmax_scale = FALSE)

Arguments

data

Either an object of class data.frame, data.table, or a path to a regular, delimited file

bin_cols, cat_cols

A vector, column names corresponding to binary and categorical variables respectively

minmax_scale

Boolean, indicating whether to scale all numeric columns between 0 and 1, to improve model convergence

Details

The function has two advantages over manual pre-processing:

  1. Utilises data.table for fast read-in and processing of large datasets

  2. Outputs an object that can be passed directly to train() without re-specifying column names etc.

For more information, see Lall and Robinson (2023): doi:10.18637/jss.v107.i09.

Value

Returns custom S3 object of class ‘midas_preproc’ containing:

  • data – processed version of input data,

  • bin_list – vector of binary variable names

  • cat_lists – embedded list of one-hot encoded categorical variable names

  • minmax_params – list of min. and max. values for each numeric object scaled

List containing converted data, categorical and binary labels to be imported into the imputation model, and scaling parameters for post-imputation transformations.

References

\insertRef

rmidas_jssrMIDAS

Examples

data = data.frame(a = sample(c("red","yellow","blue",NA),100, replace = TRUE),
                  b = 1:100,
                  c = sample(c("YES","NO",NA),100,replace = TRUE),
                  d = runif(100),
                  e = sample(c("YES","NO"), 100, replace = TRUE),
                  f = sample(c("male","female","trans","other",NA), 100, replace = TRUE),
                  stringsAsFactors = FALSE)

bin <- c("c","e")
cat <- c("a","f")

convert(data, bin_cols = bin, cat_cols = cat)

rMIDAS documentation built on Oct. 11, 2023, 5:14 p.m.