bin_data: Map a vector of numeric values into bins

Description Usage Arguments Details Examples

Description

Takes a vector of values and bin parameters and maps each value to an ordered factor whose levels are a set of bins like [0,1), [1,2), [2,3).

Values may be provided as a vector or via a pair of parameters - a data.table object and the name of the column to bin.

Usage

1
2
bin_data(x = NULL, binCol = NULL, bins = 10, binType = "explicit",
  boundaryType = "lcro]", returnDT = FALSE)

Arguments

x

A vector of values or a data.table object

binCol

A column of dt specifying the values to bin

bins
  • integer specifying the number of bins to generate

  • numeric vector specifying sequential bin boundaries {(x0, x1), (x1, x2), ..., (xn-1, xn)}

  • 2-column data.frame/data.table each row defines a bin

binType
  • "explicit" interpret bins as they are given

  • "quantile" interpret bins as quantiles (empty quantile bins will be discarded)

boundaryType
  • "lcro]" bins are [left-closed, right-open) except for last bin which is [left-closed, right-closed]

  • "lcro)" bins are [left-closed, right-open)

  • "[lorc" bins are (left-open, right-closed] except for first bin which is [left-closed, right-closed]

  • "(lorc" bins are (left-open, right-closed]

returnDT

If FALSE, return an ordered factor of bins corresponding to the values given, else return a data.table object which includes all bins and values (makes a copy of data.table object if given)

Details

This function can return two different types of output, depending on whether returnDT is TRUE or FALSE.

If returnDT=FALSE, returns an ordered factor vector of bins like [1, 2), [-3,-2), ... corresponding to the values which were binned and whose levels correspond to all the generated bins. (Note that empty bins may be present as unused factor levels).

If returnDT=TRUE, returns a data.table object with all values and all bins (including empty bins). If dt is provided instead of vals, a full copy of dt is created and merged with the set of generated bins.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
library(data.table)
iris.dt <- data.table(iris)

# custom bins
bin_data(iris.dt, binCol="Sepal.Length", bins=c(4, 5, 6, 7, 8))

# 10 equally spaced bins
bin_data(iris$Petal.Length, bins=10, returnDT=TRUE)

# make the last bin [left-closed, right-open)
bin_data(c(0,0,1,2), bins=2, boundaryType="lcro)", returnDT=TRUE)

# bin values by quantile
bin_data(c(0,0,0,0,1,2,3,4), bins=4, binType="quantile", returnDT=TRUE)

mltools documentation built on May 2, 2019, 5:22 a.m.