reduce_bins: Reduce the number of bins

Description Usage Arguments Details Value See Also Examples

View source: R/binning.R

Description

Sequentially reduce the number of bins (from a list of bins) based on similarity in terms of proportions of non-default/default status between adjacent bins.

Usage

1
2
3
4
5
6
reduce_bins(
  list_of_bins = NULL,
  min_required_bins = NULL,
  confidence_level = NULL,
  test_type = "chisq.test"
)

Arguments

list_of_bins

A list of bins.

min_required_bins

An integer (minimum two). The minimum number of bins in the returned list.

confidence_level

A double between 0 and 1 representing the confidence level passed onto the homogeneity test.

test_type

The type of homogeneity test, chisq.test or fisher.test. Defaults to the former.

Details

Similarity, or homogeneity, is assessed by performing a test of independence. The list of bins are reduced by merging the most similar pair of adjacent bins. The function terminates when a minimum number of required of bins are obtained or when all the bins are statistically different (heterogeneous) at the given level of confidence.

The returned list of bins is not guaranteed to exhibit a monotonic development of default rates, but it is likely that the sequential reduction of bins will mitigate the problem. Should monotonicity be required, see merge_list_of_bins on how to impose a manual binning approach as a final step.

Value

A list of bins. Each list component in the returned list is a bin (of class bin).

See Also

See create_initial_bins on how to create the initial bins, merge_list_of_bins for manual binning, and autobin for automatic binning.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# example of interactive binning

# create initial bins
bins <- create_initial_bins(bin_data, 30, "score", "default")
length(bins)
is_monotonic(bins)

# reduce bins by performing repeated homogeneity tests
new_bins <- reduce_bins(bins, min_required_bins = 7, confidence_level = 0.01)
length(new_bins)
is_monotonic(new_bins)

# plot initial and reduced bins
bins_df <- dplyr::bind_rows(bins)
plot(x = bins_df$mid_score, y = log(bins_df$odds), type = "p",
     col = "lightblue", cex = 1.5, pch = 20, ylab = "log(odds)",
     xlab = "score")

new_bins_df <- dplyr::bind_rows(new_bins)
points(x = new_bins_df$mid_score, y = log(new_bins_df$odds),
       col = "darkblue", cex = 1.5, pch = 20)
legend(x = "topright", legend = c("Initial bins", "Reduced bins"),
       col = c("lightblue", "darkblue"), pch = 20, pt.cex = 1.5, bty = "n")

rrunner/binsmlr documentation built on July 19, 2020, 12:41 a.m.