SMOTE: Synthetic Minority Over-sampling TEchnique

View source: R/SMOTE.R

SMOTER Documentation

Synthetic Minority Over-sampling TEchnique

Description

A balanced dataset would be return by using Synthetic Minority Over-sampling TEchnique (SMOTE) algorithm.

Usage

SMOTE(data, outcome, perc_maj = 100, k = 5)

Arguments

data

A dataset containing the predictors and the outcome. The predictors can only be continuous (numeric or integer). The outcome must be binary.

outcome

The column number or the name of the outcome variable in the dataset.

perc_maj

The desired percentage of the size of majority samples that the minority samples would be reached in the new dataset. The default is 100.

k

The number of nearest neighbours that are used to generate the new samples of the minority class. The default is 5.

Details

The synthetic minority over-sampling technique artificially generates new samples of the minority class using the nearest neighbours of these cases, in order to get a more balanced dataset.

Value

A new dataset has been balanced.

References

Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321-357.

Examples

data(abalone)
table(abalone$Class)

newdata1 <- SMOTE(abalone, 'Class')
table(newdata1$Class)

newdata2 <- SMOTE(abalone, 'Class', perc_maj=50)
table(newdata2$Class)

dongyuanwu/RSBID documentation built on May 20, 2024, 7:53 a.m.