convert: Pre-process data for Midas imputation

Description Usage Arguments Details Value Examples

View source: R/pre_processing.R

Description

convert pre-processes datasets to enable user-friendly interface with the main train() function.

Usage

1
convert(data, bin_cols = NULL, cat_cols = NULL, minmax_scale = FALSE)

Arguments

data

Either an object of class data.frame, data.table, or a path to a regular, delimited file

bin_cols, cat_cols

A vector, column names corresponding to binary and categorical variables respectively

minmax_scale

Boolean, indicating whether to scale all numeric columns between 0 and 1, to improve model convergence

Details

The function has two advantages over manual pre-processing:

  1. Utilises data.table for fast read-in and processing of large datasets

  2. Outputs an object that can be passed directly to train() without re-specifying column names etc.

Value

Returns custom S3 object of class ‘midas_preproc’ containing:

List containing converted data, categorical and binary labels to be imported into the imputation model, and scaling parameters for post-imputation transformations.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
data = data.frame(a = sample(c("red","yellow","blue",NA),100, replace = TRUE),
                  b = 1:100,
                  c = sample(c("YES","NO",NA),100,replace = TRUE),
                  d = runif(100),
                  e = sample(c("YES","NO"), 100, replace = TRUE),
                  f = sample(c("male","female","trans","other",NA), 100, replace = TRUE),
                  stringsAsFactors = FALSE)

bin <- c("c","e")
cat <- c("a","f")

convert(data, bin_cols = bin, cat_cols = cat)

rMIDAS documentation built on Jan. 30, 2021, 9:05 a.m.