smbinning: Optimal Binning for Scoring Modeling

Description Usage Arguments Value Examples

View source: R/smbinning.R

Description

Optimal Binning categorizes a numeric characteristic into bins for ulterior usage in scoring modeling. This process, also known as supervised discretization, utilizes Recursive Partitioning to categorize the numeric characteristic.
The especific algorithm is Conditional Inference Trees which initially excludes missing values (NA) to compute the cutpoints, adding them back later in the process for the calculation of the Information Value.

Usage

1
smbinning(df, y, x, p = 0.05)

Arguments

df

A data frame.

y

Binary response variable (0,1). Integer (int) is required. Name of y must not have a dot. Name "default" is not allowed.

x

Continuous characteristic. At least 5 different values. Value Inf is not allowed. Name of x must not have a dot.

p

Percentage of records per bin. Default 5% (0.05). This parameter only accepts values greater that 0.00 (0%) and lower than 0.50 (50%).

Value

The command smbinning generates and object containing the necessary info and utilities for binning. The user should save the output result so it can be used with smbinning.plot, smbinning.sql, and smbinning.gen.

Examples

1
2
3
4
5
6
7
8
9
# Load library and its dataset
library(smbinning) # Load package and its data

# Example: Optimal binning
result=smbinning(df=smbsimdf1,y="fgood",x="cbs1") # Run and save result
result$ivtable # Tabulation and Information Value
result$iv # Information value
result$bands # Bins or bands
result$ctree # Decision tree

Example output

Loading required package: sqldf
Loading required package: gsubfn
Loading required package: proto
Loading required package: RSQLite
Loading required package: partykit
Loading required package: grid
Loading required package: libcoin
Loading required package: mvtnorm
Loading required package: Formula
Warning message:
no DISPLAY variable so Tk is not available 
    Cutpoint CntRec CntGood CntBad CntCumRec CntCumGood CntCumBad PctRec
1   <= 36.44    245     137    108       245        137       108 0.0980
2 <= 51.7701    829     614    215      1074        751       323 0.3316
3    <= 59.5    520     436     84      1594       1187       407 0.2080
4     > 59.5    650     608     42      2244       1795       449 0.2600
5    Missing    256     205     51      2500       2000       500 0.1024
6      Total   2500    2000    500        NA         NA        NA 1.0000
  GoodRate BadRate    Odds LnOdds     WoE     IV
1   0.5592  0.4408  1.2685 0.2378 -1.1484 0.1694
2   0.7407  0.2593  2.8558 1.0494 -0.3369 0.0414
3   0.8385  0.1615  5.1905 1.6468  0.2605 0.0130
4   0.9354  0.0646 14.4762 2.6725  1.2862 0.2830
5   0.8008  0.1992  4.0196 1.3912  0.0049 0.0000
6   0.8000  0.2000  4.0000 1.3863  0.0000 0.5068
[1] 0.5068
[1] 11.0000 36.4400 51.7701 59.5000 90.9100

Model formula:
fgood ~ cbs1

Fitted party:
[1] root
|   [2] cbs1 <= 51.77
|   |   [3] cbs1 <= 36.44: 0.559 (n = 245, err = 60.4)
|   |   [4] cbs1 > 36.44: 0.741 (n = 829, err = 159.2)
|   [5] cbs1 > 51.77
|   |   [6] cbs1 <= 59.5: 0.838 (n = 520, err = 70.4)
|   |   [7] cbs1 > 59.5: 0.935 (n = 650, err = 39.3)

Number of inner nodes:    3
Number of terminal nodes: 4

smbinning documentation built on May 1, 2019, 10:06 p.m.