generateFactorEncoding: Convert Factor features to Numeric features

Description Usage Arguments Value Examples

View source: R/featureEngineering.R

Description

A function to convert the factor levels to a numeric value. It will take the ordering of the factor into account, so if the factor is ordinal then the resulting numeric value will also capture this relationship. If suffix is set to an empty string or NULL, then an in-place operation occurs such that the original feature will be overwritten. Otherwise the default behaviour is to have the numeric feature in a different column (appended with _enc).

Usage

1
2
generateFactorEncoding(d, features = NULL, exclude = NULL,
  suffix = "_enc", verbose = FALSE)

Arguments

d

A data frame or data table containing the data set.

features

A character vector containing a list of features to process. If left NULL, will choose ALL the factor fields within the data set. Can optionally use regular expression matching to derive the list of features by prepending it with a ~ (refer to Examples). Default: NULL.

suffix

A character string containing what to append to each converted feature. If set to NULL or an empty string then an in-place operation occurs. Default: _enc.

Value

A data frame or data table containing the transformed data set with the factor features converted to numeric features.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
sample.df <- data.frame(ID = floor(runif(100, 0, 10000)),
EFF_DATE = Sys.time() + runif(100, 0, 24*60*60*100),
EFF_TO = Sys.time() + runif(100, 24*60*60*100+1, 24*60*60*1000),
CUST_SEGMENT_CHR = as.character(floor(runif(100,0,10))),
STATE_NAME = ifelse(runif(100,0,1) < 0.56, 'VIC', ifelse(runif(100,0,1) < 0.44,'NSW', 'QLD')),
REVENUE = floor(rnorm(100, 500, 200)),
NUM_FEAT_1 = rnorm(100, 1000, 250),
NUM_FEAT_2 = rnorm(100, 20, 2),
NUM_FEAT_3 = floor(rnorm(100, 3, 0.5)),
NUM_FEAT_4 = floor(rnorm(100, 100, 10)),
RFM_SEGMENT = factor(x = letters[floor(runif(100,1,6))], levels = c("a","b","c","d","e")))

generateFactorEncoding(sample.df) # push all the converted factors as _enc
generateFactorEncoding(sample.df, suffix = NULL) # in-place conversion
generateFactorEncoding(sample.df, features = "~*") # do all features regardless of type (may not make sense!)

ivanliu1989/RQuant documentation built on Sept. 13, 2019, 11:53 a.m.