bin_data: Map a vector of numeric values into bins

Description Usage Arguments Details Examples

View source: R/bin_data.R

Description

Takes a vector of values and bin parameters and maps each value to an ordered factor whose levels are a set of bins like [0,1), [1,2), [2,3).

Values may be provided as a vector or via a pair of parameters - a data.table object and the name of the column to bin.

Usage

1
2
3
4
5
6
7
8
9
bin_data(
  x = NULL,
  binCol = NULL,
  bins = 10,
  binType = "explicit",
  boundaryType = "lcro]",
  returnDT = FALSE,
  roundbins = FALSE
)

Arguments

x

A vector of values or a data.table object

binCol

A column of dt specifying the values to bin

bins
  • integer specifying the number of bins to generate

  • numeric vector specifying sequential bin boundaries {(x0, x1), (x1, x2), ..., (xn-1, xn)}

  • 2-column data.frame/data.table each row defines a bin

binType
  • "explicit" interpret bins as they are given

  • "quantile" interpret bins as quantiles (empty quantile bins will be discarded)

boundaryType
  • "lcro]" bins are [left-closed, right-open) except for last bin which is [left-closed, right-closed]

  • "lcro)" bins are [left-closed, right-open)

  • "[lorc" bins are (left-open, right-closed] except for first bin which is [left-closed, right-closed]

  • "(lorc" bins are (left-open, right-closed]

returnDT

If FALSE, return an ordered factor of bins corresponding to the values given, else return a data.table object which includes all bins and values (makes a copy of data.table object if given)

roundbins

Should bin values be rounded? (Only applicable for binType = "quantile")

  • FALSE bin values are not rounded

  • TRUE NOT YET IMPLEMENTED. bin values are rounded to the lowest decimal such that data-to-bin mapping is not altered

  • non-negative integer bin values are rounded to this many decimal places

Details

This function can return two different types of output, depending on whether returnDT is TRUE or FALSE.

If returnDT=FALSE, returns an ordered factor vector of bins like [1, 2), [-3,-2), ... corresponding to the values which were binned and whose levels correspond to all the generated bins. (Note that empty bins may be present as unused factor levels).

If returnDT=TRUE, returns a data.table object with all values and all bins (including empty bins). If dt is provided instead of vals, a full copy of dt is created and merged with the set of generated bins.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
library(data.table)
iris.dt <- data.table(iris)

# custom bins
bin_data(iris.dt, binCol="Sepal.Length", bins=c(4, 5, 6, 7, 8))

# 10 equally spaced bins
bin_data(iris$Petal.Length, bins=10, returnDT=TRUE)

# make the last bin [left-closed, right-open)
bin_data(c(0,0,1,2), bins=2, boundaryType="lcro)", returnDT=TRUE)

# bin values by quantile
bin_data(c(0,0,0,0,1,2,3,4), bins=4, binType="quantile", returnDT=TRUE)

ben519/mltools documentation built on Sept. 22, 2021, 4:30 p.m.