NRAS: The NRAS algorithm.

Description Usage Arguments Details Value References

Description

NRAS removes minority examples that have the proportion of minority examples among their k nearest neighbours below a threshold.

Usage

1
NRAS(data, k = 5, threshold = 0.5, classes = NULL)

Arguments

data

A data frame containing the predictors and the outcome. The predictors must be numeric and the outcome must be both a binary valued factor and the last column of data.

k

Number of nearest neighbours to compute for each example in the minority class.

classes

A named vector identifying the majority and the minority classes. The names must be "Majority" and "Minority". This argument is only useful if the function is called inside another sampling function.

theshold

All minority examples where the proportion of minority neighbours is below the threshold are removed from data.

Details

NRAS fits a logistic regression model to the data and uses it to predict the probability of examples being part of the minority class. These probabilities are included as a new feature of the data and then the minority examples that have few minority examples as their neighbours are removed.

Note that the present implementation does not perform over-sampling with SMOTE as in the original article. Here, the cleaning was decoupled from the over-sampling to make NRAS usable with any other over-sampling algorithm. Therefore, the sampling_sequence function should be used in conjunction with NRAS to perform over-sampling using any over-sampling algorithm.

Value

A data frame containing a cleaned version of the input data after using the NRAS algorithm.

References

Rivera, W. A. (2017). Noise Reduction A Priori Synthetic Over-Sampling for class imbalanced data sets. Information Sciences, 408, 146-161.


RomeroBarata/bimba documentation built on May 17, 2019, 8:03 a.m.