cat_new_class: Clubbing class of categorical variables with low population...

Description Usage Arguments Value Author(s) Examples

View source: R/functions.R

Description

The function groups classes of categorical variables, which have population percentage less than a threshold, with another class of similar event rate. If a class of exactly same event rate is not available, it is clubbed with the one having a higher event rate closest to it.

Usage

1
cat_new_class(base, target, cat_var_name, threshold, event = 1)

Arguments

base

input dataframe

target

column / field name for the target variable to be passed as string (must be 0/1 type)

cat_var_name

column name or array of column names of categorical variable on which the operation is to be done, to be passed as string

threshold

threshold population percentage below which the class will be considered to be be clubbed with another class, to be provided as decimal/fraction

event

(optional) the event class, to be passed as 0 or 1 (default is 1)

Value

The function returns an object of class "cat_new_class" which is a list containing the following components:

base_new

a dataframe after clubbing low percentage classes with another class of similar or closest but higher event rate

cat_class_new

a dataframe with mapping between original classes and new clubbed classes (if any)

Author(s)

Arya Poddar <aryapoddar290990@gmail.com>

Kanishk Dogar <Kanishkd4@gmail.com>

Examples

1
2
3
4
data <- iris[1:110,]
data$Species <- as.character(data$Species)
data$Y <- sample(0:1,size=nrow(data),replace=TRUE)
data_newclass <- cat_new_class(base = data,target = "Y",cat_var_name = "Species",threshold = 0.1)

Example output



scorecardModelUtils documentation built on May 2, 2019, 9:59 a.m.