discretizer_fn: Discretizer function

Description Usage Arguments

View source: R/discretizer.r

Description

Discretizer function

Usage

1
2
3
4
discretizer_fn(column, granularity = 3, mode_freq_threshold = 0.15,
  mode_ratio_threshold = 1.5, category_range = min(granularity, 20):20,
  lower_count_bound = granularity, upper_count_bound = NULL,
  missing_level = "Missing", ...)

Arguments

column

an atomic vector. The variable to discretize.

granularity

an integer. The suggested number of levels.

mode_freq_threshold

a real value between 0 and 1. If the mode of the variable exceeds this value and is greater than mode_ratio_threshold (see next parameter) times the next greatest mode (i.e., the ratio of the value occuring most often over the value occuring second most often is over mode_ratio_threshold) then the variable will be attempted to be discretized in manner as to make the mode its own bucket. (so if the mode is 5, we'd want, e.g., [2,4), 5, and (5, 7]).

mode_ratio_threshold

a real value. See the mode_freq_threshold parameter.

category_range

The number of levels to consider when the discretization procedure descrized in the mode_freq_threshold parameter is employed. The default is min(granularity, 20):20.

lower_count_bound

an integer. Variables with less than or equal to this many unique values will not get discretized. Default is granularity.

upper_count_bound

an integer. Variables with more than or equal to this many unique values will not get discretized. Default is granularity.

missing_level

character. Any values that were NA prior to discretization will be replaced with this level. If set to NULL, then the NAs will remain. The default is "Missing".

...

additional arguments to pass to arules::discretize.


robertzk/syberiaMungebits documentation built on July 30, 2019, 3:37 p.m.