SMOTE_NC: Synthetic Minority Over-sampling TEchnique-Nominal Continuous

View source: R/SMOTE_NC.R

SMOTE_NCR Documentation

Synthetic Minority Over-sampling TEchnique-Nominal Continuous

Description

A balanced dataset would be return by using Synthetic Minority Over-sampling TEchnique-Nominal Continuous (SMOTE-NC) algorithm.

Usage

SMOTE_NC(data, outcome, perc_maj = 100, k = 5)

Arguments

data

A dataset containing the predictors and the outcome. The predictors can be continuous (numeric or integer) or catigorical (character or factor). There must be at least one continuous predictor and at least one categorical predictor. The outcome must be binary.

outcome

The column number or the name of the outcome variable in the dataset.

perc_maj

The desired percentage of the size of majority samples that the minority samples would be reached in the new dataset. The default is 100.

k

The number of nearest neighbours that are used to generate the new samples of the minority class. The default is 5.

Details

The synthetic minority over-sampling technique-nominal continuous artificially generates new samples of the minority class using the nearest neighbours of these cases, in order to get a more balanced dataset. This algorithm could handle mixed datasets of continuous and nominal features, but it could not handle datasets with all nominal features or all continuous features.

Value

A new dataset has been balanced.

References

Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321-357.

Examples

data(bank)
table(bank$deposit)

newdata1 <- SMOTE_NC(bank, 'deposit')
table(newdata1$deposit)

newdata2 <- SMOTE_NC(bank, 'deposit', perc_maj=50)
table(newdata2$deposit)

dongyuanwu/RSBID documentation built on May 20, 2024, 7:53 a.m.