MWMOTE: The MWMOTE algorithm.
In RomeroBarata/bimba: Sampling Algorithms for Two-Class Imbalanced Data Sets

Description Usage Arguments Details Value References

MWMOTE over-samples the input data using the Majority Weighted Over-Sampling TEchnique.

1
2
3

MWMOTE(data, perc_min = 50, perc_over = NULL, k1 = 5, k2 = 3,
  k3 = round(nrow(data_min_filtered)/2), cut_off = 5, max_closeness = 2,
  cluster_complexity = 3, classes = NULL)

`data`	A data frame containing the predictors and the outcome. The predictors must be numeric and the outcome must be both a binary valued factor and the last column of `data`.
`perc_min`	The desired % size of the minority class relative to the whole data set. For instance, if `perc_min` = 50 the returned data set is balanced. `perc_min` is ignored if `perc_over` is specified.
`perc_over`	% of examples to append to the input data set relative to the size of the minority class. For instance, if `perc_over` = 100 the minority class doubles in size. If specified, `perc_min` is ignored.
`k1`	Number of neighbours used to identify noisy minority examples.
`k2`	Number of neighbours used to identify the borderline majority examples.
`k3`	Number of neighbours used to identify the informative minority examples.
`cut_off`	Cut-off value to compute the closeness factor.
`max_closeness`	Maximum value for the closeness factor.
`cluster_complexity`	Value utilised to tune the trade-off between the number of clusters and their size. A large value leads to larger clusters but fewer of them, whereas a small value leads to smaller clusters but more of them.
`classes`	A named vector identifying the majority and the minority classes. The names must be "Majority" and "Minority". This argument is only useful if the function is called inside another sampling function.

MWMOTE is a complex over-sampling algorithm and comprises three main phases. First, the hard-to-learn minority examples are identified, then an importance weight is assigned to each of the hard-to-learn examples, and finally new examples are synthesised following a strategy similar to SMOTE.

For clarity, the hyperparameters Cf(th), CMAX, and Cp in the original description of MWMOTE were renamed here to cut_off, max_closeness, and cluster_complexity, respectively.

A data frame containing a more balanced version of the input data after over-sampling with the MWMOTE algorithm.

Barua, S., Islam, M. M., Yao, X., & Murase, K. (2014). MWMOTE–majority weighted minority oversampling technique for imbalanced data set learning. IEEE Transactions on Knowledge and Data Engineering, 26(2), 405-425.

RomeroBarata/bimba documentation built on May 17, 2019, 8:03 a.m.