RUS: The Random Under-Sampling algorithm.

Description Usage Arguments Details Value Examples

Description

RUS returns a more balanced version of a data set after application of the Random Under-Sampling algorithm.

Usage

1
RUS(data, perc_maj = 50, perc_under = NULL, classes = NULL)

Arguments

data

A data frame containing the predictors and the outcome. The outcome must be both a binary valued factor and the last column of data.

perc_maj

The desired % size of the majority class relative to the whole data set. For instance, if perc_maj = 50 a balanced version of the input data set is returned. perc_maj is ignored if perc_under is specified.

perc_under

% of examples to select from the majority class. If specified perc_maj is ignored.

classes

A named vector identifying the majority and the minority classes. The names must be "Majority" and "Minority". This argument is only useful if the function is called inside another sampling function.

Details

The Random Under-Sampling algorithm creates a new data set containing all examples from the minority class plus a random selection of examples from the majority class.

Value

A data frame containing a more balanced version of the input data set after application of the Random Under-Sampling algorithm. The original order of the examples is preserved.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
imb_data <- generate_imbalanced_data(num_examples = 200, 
                                     num_features = 2,
                                     imbalance_ratio = 5,
                                     noise_maj = 0,
                                     noise_min = 0,
                                     seed = 42)
 
table(imb_data$target)
table(RUS(imb_data, perc_maj = 50)$target)    # Balance the classes
table(RUS(imb_data, perc_under = 20)$target)  # Select 20% of maj. class

RomeroBarata/bimba documentation built on May 17, 2019, 8:03 a.m.