OSS: The One-Sided Selection algorithm.

Description Usage Arguments Details Value References

Description

OSS under-samples the input data using the One-Sided Selection algorithm.

Usage

1
OSS(data, classes = NULL)

Arguments

data

A data frame containing the predictors and the outcome. The predictors must be numeric and the outcome must be both a binary valued factor and the last column of data.

classes

A named vector identifying the majority and the minority classes. The names must be "Majority" and "Minority". This argument is only useful if the function is called inside another sampling function.

Details

OSS first reduces the original data set into a consistent subset and then removes all majority examples that belong to Tomek Links. To find a consistent subset, OSS creates a subset of the data containing a random example from majority class and all examples from minority class, and adds to this subset all majority examples that are missclassified by this subset using the 1-NN rule.

Value

A data frame containing a more balanced version of the input data after under-sampling it with OSS.

References

Kubat, M., & Matwin, S. (1997, July). Addressing the curse of imbalanced training sets: one-sided selection. In ICML (Vol. 97, pp. 179-186).


RomeroBarata/bimba documentation built on May 17, 2019, 8:03 a.m.