bin.knn: Bin numerical variables based on KNN

Description Usage Arguments Value Examples

Description

The numerical independent varaible (x) is firstly divided into small buckets with approximate equal number of records. Then a univariate regression model is built using the bucketed x and dependent variable (y). The buckets with similar coefficients are classied into Visualize the binning for survival/logistic model based on model coefficients. The KNN algorithm is used to bin the small buckets into bigger groups, which takes into account both the orders and coefficients of the buckets.

Usage

1
bin.knn(formula, data, n.group = 5, min.bucket = 0.05)

Arguments

formula

The formula for logistic (y ~ x) or survival model (Surv(time, status) ~ x).

data

The data frame used for binning

n.group

Number of binning groups

min.bucket

The minimum proportion of population in the buckets (a value between 0 and 1)

Value

Shows a ggplot with the regression coefficients and the binned groups

Examples

1
2
3
4
5
6
7
8
9
data <- rpart::stagec
bin.knn(pgstat ~ age, data = data, n.group = 4, min.bucket = .1)
# can be combine with the manipulate::manipulate function to change the 
# binning interactively
library(manipulate)
manipulate(bin.knn(pgstat ~ age, data = data, n.group, min.bucket),
  n.group = slider(1, 10, step = 1, initial = 5, label = 'Number of groups'),
  min.bucket = slider(0.01, 1, step = 0.01, initial = 0.05,
  label = 'Minimum Population Size (%)'))

JianhuaHuang/streamlineR documentation built on May 7, 2019, 10:40 a.m.