bin.knn: Bin numerical variables based on KNN
In JianhuaHuang/streamlineR: Streamline Routine Modeling Work

Description Usage Arguments Value Examples

The numerical independent varaible (x) is firstly divided into small buckets with approximate equal number of records. Then a univariate regression model is built using the bucketed x and dependent variable (y). The buckets with similar coefficients are classied into Visualize the binning for survival/logistic model based on model coefficients. The KNN algorithm is used to bin the small buckets into bigger groups, which takes into account both the orders and coefficients of the buckets.

1	bin.knn(formula, data, n.group = 5, min.bucket = 0.05)

`formula`	The formula for logistic (y ~ x) or survival model (Surv(time, status) ~ x).
`data`	The data frame used for binning
`n.group`	Number of binning groups
`min.bucket`	The minimum proportion of population in the buckets (a value between 0 and 1)

Shows a ggplot with the regression coefficients and the binned groups

data <- rpart::stagec
bin.knn(pgstat ~ age, data = data, n.group = 4, min.bucket = .1)
# can be combine with the manipulate::manipulate function to change the 
# binning interactively
library(manipulate)
manipulate(bin.knn(pgstat ~ age, data = data, n.group, min.bucket),
  n.group = slider(1, 10, step = 1, initial = 5, label = 'Number of groups'),
  min.bucket = slider(0.01, 1, step = 0.01, initial = 0.05,
  label = 'Minimum Population Size (%)'))