# ROSE: Random Over-Sampling Examples

### Description

The package provides functions to deal with binary classification problems in the presence of imbalanced classes. Synthetic balanced samples are generated according to ROSE (Menardi and Torelli, 2014). Functions that implement more traditional remedies to the class imbalance are also provided, as well as different metrics to evaluate a learner accuracy. These are estimated by holdout, bootrstrap or cross-validation methods.

### Details

The package pivots on function `ROSE`

which generates synthetic balanced
samples and thus allows to strenghten the subsequent estimation of any binary classifier.
ROSE (Random Over-Sampling Examples) is a bootstrap-based technique
which aids the task of binary classification in the presence of rare classes.
It handles both continuous and categorical data by generating synthetic examples from
a conditional density estimate of the two classes.
Different metrics to evaluate a learner accuracy are supplied by
functions `roc.curve`

and `accuracy.meas`

.
Holdout, bootstrap or cross-validation estimators of these accuracy metrics are
computed by means of ROSE and provided by function `ROSE.eval`

, to be used in
conjuction with virtually any binary classifier.
Additionally, function `ovun.sample`

implements more traditional remedies
to the class imbalance, such as over-sampling the minority class, under-sampling the majority
class, or a combination of over- and under- sampling.

### Author(s)

Nicola Lunardon, Giovanna Menardi, Nicola Torelli

Maintainer: Nicola Lunardon <lunardon@stat.unipd.it>

### References

Lunardon, N., Menardi, G., and Torelli, N. (2014). ROSE: a Package for Binary Imbalanced Learning. *R Jorunal*, 6:82–92.

Menardi, G. and Torelli, N. (2014). Training and assessing classification rules with imbalanced data. *Data Mining and Knowledge Discovery*, 28:92–122.

### See Also

`DMwR-package`

,
`nnet`

, `pROC-package`

,
`rpart`

### Examples

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | ```
# loading data
data(hacide)
# check imbalance
table(hacide.train$cls)
# train logistic regression on imbalanced data
log.reg.imb <- glm(cls ~ ., data=hacide.train, family=binomial)
# use the trained model to predict test data
pred.log.reg.imb <- predict(log.reg.imb, newdata=hacide.test,
type="response")
# generate new balanced data by ROSE
hacide.rose <- ROSE(cls ~ ., data=hacide.train, seed=123)$data
# check (im)balance of new data
table(hacide.rose$cls)
# train logistic regression on balanced data
log.reg.bal <- glm(cls ~ ., data=hacide.rose, family=binomial)
# use the trained model to predict test data
pred.log.reg.bal <- predict(log.reg.bal, newdata=hacide.test,
type="response")
# check accuracy of the two learners by measuring auc
roc.curve(hacide.test$cls, pred.log.reg.imb)
roc.curve(hacide.test$cls, pred.log.reg.bal, add.roc=TRUE, col=2)
# determine bootstrap distribution of the AUC of logit models
# trained on ROSE balanced samples
# B has been reduced from 100 to 10 for time saving solely
boot.auc.bal <- ROSE.eval(cls ~ ., data=hacide.train, learner= glm,
method.assess = "BOOT",
control.learner=list(family=binomial),
trace=TRUE, B=10)
summary(boot.auc.bal)
``` |