pct.bin: Monotonic binning based on percentiles
In monobin: Monotonic Binning for Credit Rating Models

View source: R/01_PCT_BINNING.R

pct.bin

R Documentation

Monotonic binning based on percentiles

Description

pct.bin implements percentile-based monotonic binning by the iterative discretization.

Usage

pct.bin(
  x,
  y,
  sc = c(NA, NaN, Inf, -Inf),
  sc.method = "together",
  g = 15,
  y.type = NA,
  woe.trend = TRUE,
  force.trend = NA
)

Arguments

`x`	Numeric vector to be binned.
`y`	Numeric target vector (binary or continuous).
`sc`	Numeric vector with special case elements. Default values are `c(NA, NaN, Inf, -Inf)`. Recommendation is to keep the default values always and add new ones if needed. Otherwise, if these values exist in `x` and are not defined in the `sc` list some statistics cannot be calculated properly.
`sc.method`	Define how special cases will be treated, all together or in separate bins. Possible values are `"together", "separately"`.
`g`	Number of starting groups. Default is 15.
`y.type`	Type of `y`, possible options are `"bina"` (binary) and `"cont"` (continuous). If default value is passed, then algorithm will identify if y is 0/1 or continuous variable.
`woe.trend`	Applied only for a continuous target (`y`) as weights of evidence (WoE) trend check. Default is TRUE.
`force.trend`	If the expected trend should be forced. Possible values: `"i"` for increasing trend (`y` increases with increase of `x`), `"d"` for decreasing trend (`y` decreases with decrease of `x`). Default value is `NA`. If the default value is passed, algorithm will stop if perfect negative or positive correlation (Spearman) is achieved between average `y` and average `x` per bin. Otherwise, it will stop only if the forced trend is achieved.

Value

The command pct.bin generates a list of two objects. The first object, data frame summary.tbl presents a summary table of final binning, while x.trans is a vector of discretized values. In case of single unique value for x or y of complete cases (cases different than special cases), it will return data frame with info.

Examples

suppressMessages(library(monobin))
data(gcd)
#binary target
mat.bin <- pct.bin(x = gcd$maturity, y = gcd$qual)
mat.bin[[1]]
table(mat.bin[[2]])
#continuous target, separate groups for special cases
set.seed(123)
gcd$age.d <- gcd$age
gcd$age.d[sample(1:nrow(gcd), 10)] <- NA
gcd$age.d[sample(1:nrow(gcd), 3)] <- 9999999999
age.d.bin <- pct.bin(x = gcd$age.d, 
			   	y = gcd$qual, 
			   	sc = c(NA, NaN, Inf, -Inf, 9999999999), 
			  	sc.method = "separately",
			   	force.trend = "d")
age.d.bin[[1]]
gcd$age.d.bin <- age.d.bin[[2]]
gcd %>% group_by(age.d.bin) %>% summarise(n = n(), y.avg = mean(qual))

monobin documentation built on July 21, 2022, 5:11 p.m.