MWMOTE: The MWMOTE algorithm.

Description Usage Arguments Details Value References

Description

MWMOTE over-samples the input data using the Majority Weighted Over-Sampling TEchnique.

Usage

1
2
3
MWMOTE(data, perc_min = 50, perc_over = NULL, k1 = 5, k2 = 3,
  k3 = round(nrow(data_min_filtered)/2), cut_off = 5, max_closeness = 2,
  cluster_complexity = 3, classes = NULL)

Arguments

data

A data frame containing the predictors and the outcome. The predictors must be numeric and the outcome must be both a binary valued factor and the last column of data.

perc_min

The desired % size of the minority class relative to the whole data set. For instance, if perc_min = 50 the returned data set is balanced. perc_min is ignored if perc_over is specified.

perc_over

% of examples to append to the input data set relative to the size of the minority class. For instance, if perc_over = 100 the minority class doubles in size. If specified, perc_min is ignored.

k1

Number of neighbours used to identify noisy minority examples.

k2

Number of neighbours used to identify the borderline majority examples.

k3

Number of neighbours used to identify the informative minority examples.

cut_off

Cut-off value to compute the closeness factor.

max_closeness

Maximum value for the closeness factor.

cluster_complexity

Value utilised to tune the trade-off between the number of clusters and their size. A large value leads to larger clusters but fewer of them, whereas a small value leads to smaller clusters but more of them.

classes

A named vector identifying the majority and the minority classes. The names must be "Majority" and "Minority". This argument is only useful if the function is called inside another sampling function.

Details

MWMOTE is a complex over-sampling algorithm and comprises three main phases. First, the hard-to-learn minority examples are identified, then an importance weight is assigned to each of the hard-to-learn examples, and finally new examples are synthesised following a strategy similar to SMOTE.

For clarity, the hyperparameters Cf(th), CMAX, and Cp in the original description of MWMOTE were renamed here to cut_off, max_closeness, and cluster_complexity, respectively.

Value

A data frame containing a more balanced version of the input data after over-sampling with the MWMOTE algorithm.

References

Barua, S., Islam, M. M., Yao, X., & Murase, K. (2014). MWMOTE–majority weighted minority oversampling technique for imbalanced data set learning. IEEE Transactions on Knowledge and Data Engineering, 26(2), 405-425.


RomeroBarata/bimba documentation built on May 17, 2019, 8:03 a.m.