bins.greedy: Greedy binning algorithm.

Description Usage Arguments Value See Also

View source: R/greedy.R

Description

bins.greedy - Wrapper around bins.greedy.impl. Goes over the sorted values of x left to right and fills the bins with the values until they are about the right size.

bins.greedy.impl - Implementation of a single-pass binning algorithm that examines sorted data left to right and builds bins of the target size. The bins.greedy wrapper around this function provides a less involved interface. This is not symmetric wrt direction: symmetric distributions may not have symmetric bins if there are multiple points with the same values. If a single value accounts for more than thresh * binsz points, it will be placed in a new bin.

Usage

1
2
3
4
bins.greedy(x, nbins, minpts = floor(0.5 * length(x)/nbins), thresh = 0.8,
  naive = FALSE)

bins.greedy.impl(xval, xtbl, xstp, binsz, nbins, thresh, verbose = F)

Arguments

x

Vector of numbers.

nbins

Target number of bins.

minpts

Minimum number of points in a bin. Only used if naive = FALSE.

naive

When TRUE, simply calls bins.greedy.impl with data derived from x. Otherwise, makes an extra step of marking the values that by themselves take a whole bin to force the algorithm to place these values in a bin separately.

xval

Sorted unique values of the data set x. This should be the numeric version of names(xtbl).

xtbl

Result of a call to table(x).

xstp

Stopping points; if xstp[i] == TRUE, the i-th value can't be merged to the (i-1)-th one. xstp[1] value is ignored.

binsz

Target bin size, i.e., the number of points falling into each bin; for example, floor(length(x) / nbins)

thresh

Threshold fraction of bin size for the greedy algorithm. Suppose there's n < binsz points in the current bin already. Also suppose that the next value V is represented by m points, and m + n > binsz. Then the algorithm will check if m > thresh * binsz, and if so, will place the value V into a new bin. If m is below the threshold, the points having value V are added to the current bin.

verbose

When TRUE, prints the number of points falling into the bins.

Value

A list with the following items:

See Also

binr, bins, bins.quantiles bins.optimize


collectivemedia/binr documentation built on May 13, 2019, 9:54 p.m.