preprocess_data: preprocess data

View source: R/preprocess_data.R

preprocess_dataR Documentation

preprocess data

Description

preprocess data according to variable name - function couple directives

Usage

preprocess_data(
  x,
  preprocessors,
  mr_name = "mr",
  mr_fun = mr_splitter,
  row_id = NULL,
  x_name = NULL,
  ...
)

Arguments

x

the data.frame

preprocessors

a named list: names = function names to be applied, content is variable to be treated with the function

mr_name

name of preprocessors associated used for multiple response

mr_fun

function used for multiple response which creates a data.frame of dummies given a single vector of multiple responses (eg separated by "|")

row_id

vector or data.frame used to identify the rows (used by verbose_coerce for reporting purposes in case of issues in data preprocessing)

x_name

name of the data.frame processed (for reporting purposes)

...

argument passed to mr_fun

Examples

data_char <- function(d) as.Date(d, format = '%d/%m/%Y')
noyes <- function(x) factor(x, levels = c('NO', 'SI'),
                            labels = c('No', 'Yes'))
test_df <- data.frame('a_date'    = c('2017-01-01', '2015-01-01'),
                      'gender'    = c('m', 'f'),
                      'female'    = c('NO', 'SI'),
                      'educ_lev'  = c('diploma', 'degree'),
                      'birth'     = c("11/08/1927", "24/05/1935"),
                      'interests' = c('reading|travel|science',
                                      'reading|science|cinema|tv'),
                      stringsAsFactors = FALSE)
preprocessors <- list("as.Date" = 'a_date',
              "factor" = c("gender", "educ_lev"),
              "data_char" = 'birth',
              "noyes" = 'female',
              "mr" = "interests")
str(test_df)
preproc <- preprocess_data(test_df, preprocessors)
str(preproc)

lbraglia/lbmisc documentation built on April 29, 2024, 11:27 a.m.