discretizer: Discretizer function

Description Arguments

Description

Discretizer function

Arguments

column

an atomic vector. The variable to discretize.

name

character. Name of the colum.

granularity

an integer. The suggested number of levels.

mode_freq_threshold

a real value between 0 and 1. If the mode of the variable exceeds this value and is greater than mode_ratio_threshold (see next parameter) times the next greatest mode (i.e., the ratio of the value occuring most often over the value occuring second most often is over mode_ratio_threshold) then the variable will be attempted to be discretized in manner as to make the mode its own bucket. (so if the mode is 5, we'd want, e.g., [2,4), 5, and (5, 7]).

mode_ratio_threshold

a real value. See the mode_freq_threshold parameter.

category_range

The number of levels to consider when the discretization procedure descrized in the mode_freq_threshold parameter is employed. The default is min(granularity, 20):20.

lower_count_bound

an integer. Variables with less than or equal to this many unique values will not get discretized. Default is granularity.

upper_count_bound

an integer. Variables with more than or equal to this many unique values will not get discretized. Default is granularity.

missing_level

character. Any values that were NA prior to discretization will be replaced with this level. If set to NULL, then the NAs will remain. The default is "Missing".

...

additional arguments to pass to arules_discretize.


syberia/syberiaMungebits2 documentation built on May 30, 2019, 10:42 p.m.