Description Usage Arguments Value Examples
Optimal Binning categorizes a numeric characteristic into bins for ulterior usage in scoring modeling.
This process, also known as supervised discretization,
utilizes Recursive Partitioning to categorize
the numeric characteristic.
The especific algorithm is Conditional Inference Trees
which initially excludes missing values (NA
) to compute the cutpoints, adding them back later in the
process for the calculation of the Information Value.
1 |
df |
A data frame. |
y |
Binary response variable (0,1). Integer ( |
x |
Continuous characteristic. At least 5 different values. Value |
p |
Percentage of records per bin. Default 5% (0.05). This parameter only accepts values greater that 0.00 (0%) and lower than 0.50 (50%). |
The command smbinning
generates and object containing the necessary info and utilities for binning.
The user should save the output result so it can be used
with smbinning.plot
, smbinning.sql
, and smbinning.gen
.
1 2 3 4 5 6 7 8 9 | # Load library and its dataset
library(smbinning) # Load package and its data
# Example: Optimal binning
result=smbinning(df=smbsimdf1,y="fgood",x="cbs1") # Run and save result
result$ivtable # Tabulation and Information Value
result$iv # Information value
result$bands # Bins or bands
result$ctree # Decision tree
|
Loading required package: sqldf
Loading required package: gsubfn
Loading required package: proto
Loading required package: RSQLite
Loading required package: partykit
Loading required package: grid
Loading required package: libcoin
Loading required package: mvtnorm
Loading required package: Formula
Warning message:
no DISPLAY variable so Tk is not available
Cutpoint CntRec CntGood CntBad CntCumRec CntCumGood CntCumBad PctRec
1 <= 36.44 245 137 108 245 137 108 0.0980
2 <= 51.7701 829 614 215 1074 751 323 0.3316
3 <= 59.5 520 436 84 1594 1187 407 0.2080
4 > 59.5 650 608 42 2244 1795 449 0.2600
5 Missing 256 205 51 2500 2000 500 0.1024
6 Total 2500 2000 500 NA NA NA 1.0000
GoodRate BadRate Odds LnOdds WoE IV
1 0.5592 0.4408 1.2685 0.2378 -1.1484 0.1694
2 0.7407 0.2593 2.8558 1.0494 -0.3369 0.0414
3 0.8385 0.1615 5.1905 1.6468 0.2605 0.0130
4 0.9354 0.0646 14.4762 2.6725 1.2862 0.2830
5 0.8008 0.1992 4.0196 1.3912 0.0049 0.0000
6 0.8000 0.2000 4.0000 1.3863 0.0000 0.5068
[1] 0.5068
[1] 11.0000 36.4400 51.7701 59.5000 90.9100
Model formula:
fgood ~ cbs1
Fitted party:
[1] root
| [2] cbs1 <= 51.77
| | [3] cbs1 <= 36.44: 0.559 (n = 245, err = 60.4)
| | [4] cbs1 > 36.44: 0.741 (n = 829, err = 159.2)
| [5] cbs1 > 51.77
| | [6] cbs1 <= 59.5: 0.838 (n = 520, err = 70.4)
| | [7] cbs1 > 59.5: 0.935 (n = 650, err = 39.3)
Number of inner nodes: 3
Number of terminal nodes: 4
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.