KMUS: The k-Means Under-Sampling algorithm.
In RomeroBarata/bimba: Sampling Algorithms for Two-Class Imbalanced Data Sets

Description Usage Arguments Details Value

KMUS returns a more balanced version of a data set after under-sampling the majority class using the k-Means algorithm.

1 2	KMUS(data, perc_maj = 50, perc_under = NULL, max_iter = 100L, nstart = 10L, classes = NULL)

`data`	A data frame containing the predictors and the outcome. The predictors must be numeric and the outcome must be both a binary valued factor and the last column of `data`.
`perc_maj`	The desired % size of the majority class relative to the whole data set. For instance, if `perc_maj` = 50 a balanced version of the input data set is returned. `perc_maj` is ignored if `perc_under` is specified.
`perc_under`	% of examples to select from the majority class. If specified `perc_maj` is ignored.
`max_iter`	Maximum number of iterations of the k-Means algorithm.
`nstart`	Number of random restarts of the k-Means algorithm.
`classes`	A named vector identifying the majority and the minority classes. The names must be "Majority" and "Minority". This argument is only useful if the function is called inside another sampling function.

KMUS is an adaptation of the k-Means algorithm to work as an under-sampling algorithm. It clusters the majority class using the k-Means algorithm and uses the centroids computed as the representatives of the majority class.

A data frame containing a more balanced version of the input data set after under-sampling it with the KMUS algorithm. As the majority examples returned by KMUS are not examples of the input data set, original order of the examples cannot be preserved. Thus, the returned data frame contains all majority examples followed by all minority examples.

RomeroBarata/bimba documentation built on May 17, 2019, 8:03 a.m.