Description Usage Arguments Details References Examples

The function implements SMOTEBagging for binary classification. It returns a list of weak learners that are built on training-sets manipulated by SMOTE and random over-sampling. They together consist the ensemble model.

1 |

`formula` |
A formula specify predictors and target variable. Target variable should be a factor of 0 and 1. Predictors can be either numerical and categorical. |

`data` |
A data frame used for training the model, i.e. training set. |

`size` |
Ensemble size, i.e. number of weak learners in the ensemble model. |

`alg` |
The learning algorithm used to train weak learners in the ensemble model. |

`smote.k` |
Number of k applied in SMOTE algorithm. Default is 5. |

`rf.ntree` |
Number of decision trees in each forest of the ensemble model when using |

`svm.ker` |
Specifying kernel function when using svm as base algorithm. Four options are available: |

SMOTEBagging uses both SMOTE (Synthetic Minority Over-sampling TEchnique) and random over-sampling to increase minority instances in each bag of Bagging in order to rebalance class distribution. The manipulated training sets contain equal numbers of majority and minority instances, but the proportions of minority instances from SMOTE and random over-sampling vary for different bags, determined by an assigned re-sampling rate *a*. The re-sampling rate *a* is always the multiple of 10, and the function automatically generates a vector of *a*, therefore users do not need to self-define.

The function requires the target varible to be a factor of 0 and 1, where 1 indicates minority while 0 indicates majority instances. Only binary classification is implemented in this version.

Argument *alg* specifies the learning algorithm used to train weak learners within the ensemble model. Totally five algorithms are implemented: **cart** (Classification and Regression Tree), **c50** (C5.0 Decision Tree), **rf** (Random Forest), **nb** (Naive Bayes), and **svm** (Support Vector Machine). When using Random Forest as base learner, the ensemble model is consisted of forests and each forest contains a number of trees.

The object class of returned list is defined as *modelBag*, which can be directly passed to predict() for predicting test instances.

Wang, S. and Yao, X. 2009. Diversity Analysis on Imbalanced Data Sets by Using Ensemble Models. IEEE Symposium on Computational Intelligence and Data Mining, CIDM '09.

Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., and Herrera, F. 2012. A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews). 42(4), pp. 463-484.

1 2 3 4 5 6 | ```
data("iris")
iris <- iris[1:70, ]
iris$Species <- factor(iris$Species, levels = c("setosa", "versicolor"), labels = c("0", "1"))
model1 <- sbag(Species ~ ., data = iris, size = 10, alg = "c50")
model2 <- sbag(Species ~ ., data = iris, size = 20, alg = "rf", rf.ntree = 100)
model3 <- sbag(Species ~ ., data = iris, size = 40, alg = "svm", svm.ker = "sigmoid")
``` |

```
```

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.