View source: R/06_NDR_BINNING.R

ndr.bin | R Documentation |

`ndr.bin`

implements extension of three-stage monotonic binning procedure (`iso.bin`

)
with step of regression with nested dummies as fourth stage.
The first stage is isotonic regression used to achieve the monotonicity. The next two stages are possible corrections for
minimum percentage of observations and target rate, while the last regression stage is used to identify
statistically significant cut points.

ndr.bin( x, y, sc = c(NA, NaN, Inf, -Inf), sc.method = "together", y.type = NA, min.pct.obs = 0.05, min.avg.rate = 0.01, p.val = 0.05, force.trend = NA )

`x` |
Numeric vector to be binned. |

`y` |
Numeric target vector (binary or continuous). |

`sc` |
Numeric vector with special case elements. Default values are |

`sc.method` |
Define how special cases will be treated, all together or separately.
Possible values are |

`y.type` |
Type of |

`min.pct.obs` |
Minimum percentage of observations per bin. Default is 0.05 or 30 observations. |

`min.avg.rate` |
Minimum |

`p.val` |
Threshold for p-value of regression coefficients. Default is 0.05. For a binary target binary logistic regression is estimated, whereas for a continuous target, linear regression is used. |

`force.trend` |
If the expected trend should be forced. Possible values: |

The command `ndr.bin`

generates a list of two objects. The first object, data frame `summary.tbl`

presents a summary table of final binning, while `x.trans`

is a vector of discretized values.
In case of single unique value for `x`

or `y`

of complete cases (cases different than special cases),
it will return data frame with info.

`iso.bin`

for three-stage monotonic binning procedure.

suppressMessages(library(monobin)) data(gcd) age.bin <- ndr.bin(x = gcd$age, y = gcd$qual) age.bin[[1]] table(age.bin[[2]]) #linear regression example amount.bin <- ndr.bin(x = gcd$amount, y = gcd$qual, y.type = "cont", p.val = 0.05) #create nested dummies db.reg <- gcd[, c("qual", "amount")] db.reg$amount.bin <- amount.bin[[2]] amt.s <- db.reg %>% group_by(amount.bin) %>% summarise(qual.mean = mean(qual), amt.min = min(amount)) mins <- amt.s$amt.min for (i in 2:length(mins)) { level.l <- mins[i] nd <- ifelse(db.reg$amount < level.l, 0, 1) db.reg <- cbind.data.frame(db.reg, nd) names(db.reg)[ncol(db.reg)] <- paste0("dv_", i) } reg.f <- paste0("qual ~ dv_2 + dv_3") lrm <- lm(as.formula(reg.f), data = db.reg) lr.coef <- data.frame(summary(lrm)$coefficients) lr.coef cumsum(lr.coef$Estimate) #check as.data.frame(amt.s) diff(amt.s$qual.mean)

