iso.bin: Three-stage monotonic binning procedure

View source: R/02_ISO_BINNING.R

iso.binR Documentation

Three-stage monotonic binning procedure

Description

iso.bin implements three-stage monotonic binning procedure. The first stage is isotonic regression used to achieve the monotonicity, while the remaining two stages are possible corrections for minimum percentage of observations and target rate.

Usage

iso.bin(
  x,
  y,
  sc = c(NA, NaN, Inf, -Inf),
  sc.method = "together",
  y.type = NA,
  min.pct.obs = 0.05,
  min.avg.rate = 0.01,
  force.trend = NA
)

Arguments

x

Numeric vector to be binned.

y

Numeric target vector (binary or continuous).

sc

Numeric vector with special case elements. Default values are c(NA, NaN, Inf, -Inf). Recommendation is to keep the default values always and add new ones if needed. Otherwise, if these values exist in x and are not defined in the sc vector, function will report the error.

sc.method

Define how special cases will be treated, all together or in separate bins. Possible values are "together", "separately".

y.type

Type of y, possible options are "bina" (binary) and "cont" (continuous). If default value (NA) is passed, then algorithm will identify if y is 0/1 or continuous variable.

min.pct.obs

Minimum percentage of observations per bin. Default is 0.05 or minimum 30 observations.

min.avg.rate

Minimum y average rate. Default is 0.01 or minimum 1 bad case for y 0/1.

force.trend

If the expected trend should be forced. Possible values: "i" for increasing trend (y increases with increase of x), "d" for decreasing trend (y decreases with decrease of x). Default value is NA. If the default value is passed, then trend will be identified based on the sign of the Spearman correlation coefficient between x and y on complete cases.

Details

The corrections of isotonic regression results present an important step in credit rating model development. The minimum percentage of observation is capped to minimum 30 observations per bin, while target rate for binary target is capped to 1 bad case.

Value

The command iso.bin generates a list of two objects. The first object, data frame summary.tbl presents a summary table of final binning, while x.trans is a vector of discretized values. In case of single unique value for x or y of complete cases (cases different than special cases), it will return data frame with info.

Examples

suppressMessages(library(monobin))
data(gcd)
age.bin <- iso.bin(x = gcd$age, y = gcd$qual)
age.bin[[1]]
table(age.bin[[2]])
# force increasing trend
iso.bin(x = gcd$age, y = gcd$qual, force.trend = "i")[[1]]

#stage by stage example
#inputs
x <- gcd$age		#risk factor
y <- gcd$qual	#binary dependent variable
min.pct.obs <- 0.05	#minimum percentage of observations per bin
min.avg.rate <- 0.01	#minimum percentage of defaults per bin
#stage 1: isotonic regression
db <- data.frame(x, y)
db <- db[order(db$x), ]
cc.sign <- sign(cor(db$y, db$x, method = "spearman", use = "complete.obs"))
iso.r <- isoreg(x = db$x, y = cc.sign * db$y)
db$y.hat <- iso.r$yf
db.s0 <- db %>%
	   group_by(bin = y.hat) %>%
	   summarise(no = n(),
			 y.sum = sum(y),
			 y.avg = mean(y),
			 x.avg = mean(x),
			 x.min = min(x),
			 x.max = max(x))
db.s0 
#stage 2: merging based on minimum percentage of observations
db.s1 <- db.s0
thr.no <- ceiling(ifelse(nrow(db) * min.pct.obs < 30, 30, nrow(db) * min.pct.obs))
thr.no #threshold for minimum number of observations per bin
repeat {
		 if	(nrow(db.s1) == 1) {break}
		 values <- db.s1[, "no"]
		 if	(all(values >= thr.no)) {break}
		 gap <- min(which(values < thr.no))
		 if	(gap == nrow(db.s1)) {
			db.s1$bin[(gap - 1):gap] <- db.s1$bin[(gap - 1)]
			} else {
			db.s1$bin[gap:(gap + 1)] <- db.s1$bin[gap + 1]
			}	
		 db.s1 <- db.s1 %>%
			    group_by(bin) %>%
			    mutate(
				y.avg = weighted.mean(y.avg, no),
				x.avg = weighted.mean(x.avg, no)) %>% 
			    summarise(
				no = sum(no),
				y.sum = sum(y.sum),
				y.avg = unique(y.avg),
				x.avg = unique(x.avg),
				x.min = min(x.min),
				x.max = max(x.max))
		} 
db.s1
#stage 3: merging based on minimum percentage of bad cases
db.s2 <- db.s1
thr.nb <- ceiling(ifelse(nrow(db) * min.avg.rate < 1, 1, nrow(db) * min.avg.rate))
thr.nb #threshold for minimum number of observations per bin
#already each bin has more bad cases than selected threshold hence no need for further merging
all(db.s2$y.sum > thr.nb)
#final result
db.s2
#result of the iso.bin function (formatting and certain metrics has been added)
iso.bin(x = gcd$age, y = gcd$qual)[[1]]


monobin documentation built on July 21, 2022, 5:11 p.m.