Description Usage Arguments Details Value Note References See Also Examples
The function implementes the Racing algorithm [2] for selecting the best technique to re-balance or remove noisy instances in unbalanced datasets [1].
1 2 |
formula |
formula describing the model to be fitted. |
data |
the unbalanced dataset |
algo |
the classification algorithm to use with the mlr package. |
positive |
label of the positive (minority) class. |
ncore |
the number of core to use in the Race. Race is performed with parallel exectuion when ncore > 1. |
nFold |
number of folds in the cross-validation that provides the subset of data to the Race |
maxFold |
maximum number of folds to use in the Race |
maxExp |
maximum number of experiments to use in the Race |
stat.test |
statistical test to use to remove candidates which perform significantly worse than the best. |
metric |
metric used to asses the classification. |
ubConf |
configuration of the balancing techniques used in the Race. |
verbose |
print extra information (TRUE/FALSE) |
... |
additional arguments pass to train function in mlr package. |
The argument metric can take the following values: "gmean", "f1" (F-score or F-measure), "auc" (Area Under ROC curve). Argument stat.test defines the statistical test used to remove candidates during the race. It can take the following values: "friedman" (Friedman test), "t.bonferroni" (t-test with bonferroni correction), "t.holm" (t-test with holm correction), "t.none" (t-test without correction), "no" (no test, the Race continues until new subsets of data are provided by the cross validation). Argument balanceConf is a list passed to function ubBalance that is used for configuration.
The function returns a list:
Race |
matrix containing accuracy results for each technique in the Race. |
best |
best technique selected in the Race. |
avg |
average of the metric used in the Race for the technique selected. |
sd |
standard deviation of the metric used in the Race for the technique selected. |
N.test |
number of experiments used in the Race. |
Gain |
% of computational gain with resepct to the maximum number of experiments given by the cross validation. |
The function ubRacing is a modified version of the race function availble in the race package: http://cran.r-project.org/package=race.
1. Dal Pozzolo, Andrea, et al. "Racing for unbalanced methods selection." Intelligent Data Engineering and Automated Learning - IDEAL 2013. Springer Berlin Heidelberg, 2013. 24-31.
2. Birattari, Mauro, et al. "A Racing Algorithm for Configuring Metaheuristics."GECCO. Vol. 2. 2002.
ubBalance
, ubOver
, ubUnder
, ubSMOTE
, ubOSS
, ubCNN
, ubENN
, ubNCL
, ubTomek
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | #use Racing to select the best technique for an unbalanced dataset
library(unbalanced)
data(ubIonosphere)
#configure sampling parameters
ubConf <- list(type="ubUnder", percOver=200, percUnder=200, k=2, perc=50, method="percPos", w=NULL)
#load the classification algorithm that you intend to use inside the Race
#see 'mlr' package for supported algorithms
library(randomForest)
#use only 5 trees
results <- ubRacing(Class ~., ubIonosphere, "randomForest", positive=1, ubConf=ubConf, ntree=5)
# try with 500 trees
# results <- ubRacing(Class ~., ubIonosphere, "randomForest", positive=1, ubConf=ubConf, ntree=500)
# let's try with a different algorithm
# library(e1071)
# results <- ubRacing(Class ~., ubIonosphere, "svm", positive=1, ubConf=ubConf)
# library(rpart)
# results <- ubRacing(Class ~., ubIonosphere, "rpart", positive=1, ubConf=ubConf)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.