binning_by: Optimal Binning for Scoring Modeling

binning_byR Documentation

Optimal Binning for Scoring Modeling

Description

The binning_by() finding intervals for numerical variable using optical binning. Optimal binning categorizes a numeric characteristic into bins for ulterior usage in scoring modeling.

Usage

binning_by(.data, y, x, p = 0.05, ordered = TRUE, labels = NULL)

Arguments

.data

a data frame.

y

character. name of binary response variable(0, 1). The variable must contain only the integers 0 and 1 as element. However, in the case of factor having two levels, it is performed while type conversion is performed in the calculation process.

x

character. name of continuous characteristic variable. At least 5 different values. and Inf is not allowed.

p

numeric. percentage of records per bin. Default 5% (0.05). This parameter only accepts values greater that 0.00 (0%) and lower than 0.50 (50%).

ordered

logical. whether to build an ordered factor or not.

labels

character. the label names to use for each of the bins.

Details

This function is useful when used with the mutate/transmute function of the dplyr package. And this function is implemented using smbinning() function of smbinning package.

Value

an object of "optimal_bins" class. Attributes of "optimal_bins" class is as follows.

  • class : "optimal_bins".

  • type : binning type, "optimal".

  • breaks : numeric. the number of intervals into which x is to be cut.

  • levels : character. levels of binned value.

  • raw : numeric. raw data, x argument value.

  • ivtable : data.frame. information value table.

  • iv : numeric. information value.

  • target : integer. binary response variable.

attributes of "optimal_bins" class

Attributes of the "optimal_bins" class that is as follows.

  • class : "optimal_bins".

  • levels : character. factor or ordered factor levels

  • type : character. binning method

  • breaks : numeric. breaks for binning

  • raw : numeric. before the binned the raw data

  • ivtable : data.frame. information value table

  • iv : numeric. information value

  • target : integer. binary response variable

See vignette("transformation") for an introduction to these concepts.

See Also

binning, plot.optimal_bins.

Examples


library(dplyr)

# Generate data for the example
heartfailure2 <- heartfailure
heartfailure2[sample(seq(NROW(heartfailure2)), 5), "creatinine"] <- NA

# optimal binning using character
bin <- binning_by(heartfailure2, "death_event", "creatinine")

# optimal binning using name
bin <- binning_by(heartfailure2, death_event, creatinine)
bin

# performance table
attr(bin, "performance")

# summary optimal_bins class
summary(bin)

# visualize all information for optimal_bins class
plot(bin)

# visualize WoE information for optimal_bins class
plot(bin, type = "WoE")

# visualize all information without typographic
plot(bin, typographic = FALSE)

# extract binned results
extract(bin) %>% 
  head(20)



dlookr documentation built on July 9, 2023, 6:31 p.m.