SMOTE_NC: Synthetic Minority Over-sampling TEchnique-Nominal Continuous
In dongyuanwu/RSBID: Resampling Strategies for Binary Imbalanced Datasets

View source: R/SMOTE_NC.R

SMOTE_NC

R Documentation

Synthetic Minority Over-sampling TEchnique-Nominal Continuous

Description

A balanced dataset would be return by using Synthetic Minority Over-sampling TEchnique-Nominal Continuous (SMOTE-NC) algorithm.

Usage

SMOTE_NC(data, outcome, perc_maj = 100, k = 5)

Arguments

`data`	A dataset containing the predictors and the outcome. The predictors can be continuous (`numeric` or `integer`) or catigorical (`character` or `factor`). There must be at least one continuous predictor and at least one categorical predictor. The outcome must be binary.
`outcome`	The column number or the name of the outcome variable in the dataset.
`perc_maj`	The desired percentage of the size of majority samples that the minority samples would be reached in the new dataset. The default is 100.
`k`	The number of nearest neighbours that are used to generate the new samples of the minority class. The default is 5.

Details

The synthetic minority over-sampling technique-nominal continuous artificially generates new samples of the minority class using the nearest neighbours of these cases, in order to get a more balanced dataset. This algorithm could handle mixed datasets of continuous and nominal features, but it could not handle datasets with all nominal features or all continuous features.

Value

A new dataset has been balanced.

References

Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321-357.

Examples

data(bank)
table(bank$deposit)

newdata1 <- SMOTE_NC(bank, 'deposit')
table(newdata1$deposit)

newdata2 <- SMOTE_NC(bank, 'deposit', perc_maj=50)
table(newdata2$deposit)

dongyuanwu/RSBID documentation built on May 20, 2024, 7:53 a.m.