autoDataprep: Automatic data preparation for ML algorithms

Description Usage Arguments Details Value See Also Examples

View source: R/autoDataPrep.R

Description

Final data preparation before ML algorithms. Function provides final data set and highlights of the data preparation

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
autoDataprep(
  data,
  target = NULL,
  missimpute = "default",
  auto_mar = FALSE,
  mar_object = NULL,
  dummyvar = TRUE,
  char_var_limit = 12,
  aucv = 0.02,
  corr = 0.99,
  outlier_flag = FALSE,
  interaction_var = FALSE,
  frequent_var = FALSE,
  uid = NULL,
  onlykeep = NULL,
  drop = NULL,
  verbose = FALSE
)

Arguments

data

[data.frame | Required] dataframe or data.table

target

[integer | Required] dependent variable (binary or multiclass)

missimpute

[text | Optional] missing value imputation using mlr misimpute function. Please refer to the "details" section to know more

auto_mar

[character | Optional] identify any missing variable which are completely missing at random or not (default FALSE). If TRUE this will call autoMAR()

mar_object

[character | Optional] object created from autoMAR function

dummyvar

[logical | Optional] categorical feature engineering i.e. one hot encoding (default is TRUE)

char_var_limit

[integer | Optional] default limit is 12 for a dummy variable preparation. e.g. if gender variable has two different value "M" and "F", then gender has 2 levels

aucv

[integer | Optional] cut off value for AUC based variable selection

corr

[integer | Optional] cut off value for correlation based variable selection

outlier_flag

[logical | Optional] to add outlier features (default is FALSE)

interaction_var

[logical | Optional] bulk interactions transformer for numerical features

frequent_var

[logical | Optional] frequent transformer for categorical features

uid

[character | Optional] unique identifier column if any to keep in the final data set

onlykeep

[character | Optional] only consider selected variables for data preparation

drop

[character | Optional] exclude variables from the dataset

verbose

[logical | Optional] display executions steps on console(default is FALSE)

Details

Missing imputation using impute function from MLR

MLR package have a appropriate way to impute missing value using multiple methods. #'

optional: You might be interested to impute missing variable using ML method. List of algorithms will be handle missing variables in MLR package listLearners("classif", check.packages = TRUE, properties = "missings")[c("class", "package")]

Feature engineering

Feature reduction

Value

list output contains below objects

complete_data

complete dataset including new derived features based on the functional understanding of the dataset

master_data

filtered dataset based on the input parameters

final_var_list

list of master variables

auc_var

list of auc variables

cor_var

list of correlation variables

overall_var

all variables in the dataset

zerovariance

variables with zero variance in the dataset

See Also

impute

Examples

1
2
3
4
5
#Auto data prep
traindata <- autoDataprep(heart, target = "target_var", missimpute = "default",
dummyvar = TRUE, aucv = 0.02, corr = 0.98, outlier_flag = TRUE,
interaction_var = TRUE, frequent_var = TRUE)
train <- traindata$master_data

daya6489/DriveML documentation built on July 22, 2021, 4:21 a.m.