sts.bin: Four-stage monotonic binning procedure with statistical test...

View source: R/05_STS_BINNING.R

sts.binR Documentation

Four-stage monotonic binning procedure with statistical test correction

Description

sts.bin implements extension of the three-stage monotonic binning procedure (iso.bin) with final step of iterative merging of adjacent bins based on statistical test.

Usage

sts.bin(
  x,
  y,
  sc = c(NA, NaN, Inf, -Inf),
  sc.method = "together",
  y.type = NA,
  min.pct.obs = 0.05,
  min.avg.rate = 0.01,
  p.val = 0.05,
  force.trend = NA
)

Arguments

x

Numeric vector to be binned.

y

Numeric target vector (binary or continuous).

sc

Numeric vector with special case elements. Default values are c(NA, NaN, Inf, -Inf). Recommendation is to keep the default values always and add new ones if needed. Otherwise, if these values exist in x and are not defined in the sc vector, function will report the error.

sc.method

Define how special cases will be treated, all together or in separate bins. Possible values are "together", "separately".

y.type

Type of y, possible options are "bina" (binary) and "cont" (continuous). If default value (NA) is passed, then algorithm will identify if y is 0/1 or continuous variable.

min.pct.obs

Minimum percentage of observations per bin. Default is 0.05 or minimum 30 observations.

min.avg.rate

Minimum y average rate. Default is 0.01 or minimum 1 bad case for y 0/1.

p.val

Threshold for p-value of statistical test. Default is 0.05. For binary target test of two proportion is applied, while for continuous two samples independent t-test.

force.trend

If the expected trend should be forced. Possible values: "i" for increasing trend (y increases with increase of x), "d" for decreasing trend (y decreases with decrease of x). Default value is NA. If the default value is passed, then trend will be identified based on the sign of the Spearman correlation coefficient between x and y on complete cases.

Value

The command sts.bin generates a list of two objects. The first object, data frame summary.tbl presents a summary table of final binning, while x.trans is a vector of discretized values. In case of single unique value for x or y of complete cases (cases different than special cases), it will return data frame with info.

See Also

iso.bin for three-stage monotonic binning procedure.

Examples

suppressMessages(library(monobin))
data(gcd)
#binary target
maturity.bin <- sts.bin(x = gcd$maturity, y = gcd$qual)
maturity.bin[[1]]
tapply(gcd$qual, maturity.bin[[2]], function(x) c(length(x), sum(x), mean(x)))
prop.test(x = c(sum(gcd$qual[maturity.bin[[2]]%in%"01 (-Inf,8)"]), 
	       sum(gcd$qual[maturity.bin[[2]]%in%"02 [8,16)"])), 
       n = c(length(gcd$qual[maturity.bin[[2]]%in%"01 (-Inf,8)"]),
	       length(gcd$qual[maturity.bin[[2]]%in%"02 [8,16)"])), 
       alternative = "less", 
       correct = FALSE)$p.value
#continuous target
age.bin <- sts.bin(x = gcd$age, y = gcd$qual, y.type = "cont")
age.bin[[1]]
t.test(x = gcd$qual[age.bin[[2]]%in%"01 (-Inf,26)"], 
    y = gcd$qual[age.bin[[2]]%in%"02 [26,35)"],
    alternative = "greater")$p.value


monobin documentation built on July 21, 2022, 5:11 p.m.