NRAS: The NRAS algorithm.
In RomeroBarata/bimba: Sampling Algorithms for Two-Class Imbalanced Data Sets

Description Usage Arguments Details Value References

NRAS removes minority examples that have the proportion of minority examples among their k nearest neighbours below a threshold.

1	NRAS(data, k = 5, threshold = 0.5, classes = NULL)

`data`	A data frame containing the predictors and the outcome. The predictors must be numeric and the outcome must be both a binary valued factor and the last column of `data`.
`k`	Number of nearest neighbours to compute for each example in the minority class.
`classes`	A named vector identifying the majority and the minority classes. The names must be "Majority" and "Minority". This argument is only useful if the function is called inside another sampling function.
`theshold`	All minority examples where the proportion of minority neighbours is below the `threshold` are removed from `data`.

NRAS fits a logistic regression model to the data and uses it to predict the probability of examples being part of the minority class. These probabilities are included as a new feature of the data and then the minority examples that have few minority examples as their neighbours are removed.

Note that the present implementation does not perform over-sampling with SMOTE as in the original article. Here, the cleaning was decoupled from the over-sampling to make NRAS usable with any other over-sampling algorithm. Therefore, the sampling_sequence function should be used in conjunction with NRAS to perform over-sampling using any over-sampling algorithm.

A data frame containing a cleaned version of the input data after using the NRAS algorithm.

Rivera, W. A. (2017). Noise Reduction A Priori Synthetic Over-Sampling for class imbalanced data sets. Information Sciences, 408, 146-161.

RomeroBarata/bimba documentation built on May 17, 2019, 8:03 a.m.