ubRacing | R Documentation |
The function implementes the Racing algorithm [2] for selecting the best technique to re-balance or remove noisy instances in unbalanced datasets [1].
ubRacing(formula, data, algo, positive=1, ncore=1, nFold=10, maxFold=10, maxExp=100, stat.test="friedman", metric="f1", ubConf, threshold=NULL, verbose=FALSE, ...)
formula |
formula describing the model to be fitted. |
data |
the unbalanced dataset |
algo |
the classification algorithm to use with the mlr package. |
positive |
label of the positive (minority) class. |
ncore |
the number of core to use in the Race. Race is performed with parallel exectuion when ncore > 1. |
nFold |
number of folds in the cross-validation that provides the subset of data to the Race |
maxFold |
maximum number of folds to use in the Race |
maxExp |
maximum number of experiments to use in the Race |
stat.test |
statistical test to use to remove candidates which perform significantly worse than the best. |
metric |
metric used to asses the classification (f1, auc or gmean). |
ubConf |
configuration of the balancing techniques used in the Race. |
threshold |
threshold used to classify instances. If NULL use default values by mlr package. |
verbose |
print extra information (TRUE/FALSE) |
... |
additional arguments pass to train function in mlr package. |
The argument metric can take the following values: "gmean", "f1" (F-score or F-measure), "auc" (Area Under ROC curve). Argument stat.test defines the statistical test used to remove candidates during the race. It can take the following values: "friedman" (Friedman test), "t.bonferroni" (t-test with bonferroni correction), "t.holm" (t-test with holm correction), "t.none" (t-test without correction), "no" (no test, the Race continues until new subsets of data are provided by the cross validation). Argument ubConf is a list passed to function ubBalance that is used for configuration.
The function returns a list:
Race |
matrix containing accuracy results for each technique in the Race. |
best |
best technique selected in the Race. |
avg |
average of the metric used in the Race for the technique selected. |
sd |
standard deviation of the metric used in the Race for the technique selected. |
N.test |
number of experiments used in the Race. |
Gain |
% of computational gain with resepct to the maximum number of experiments given by the cross validation. |
The function ubRacing is a modified version of the race function availble in the race package: http://cran.r-project.org/package=race.
1. Dal Pozzolo, Andrea, et al. "Racing for unbalanced methods selection." Intelligent Data Engineering and Automated Learning - IDEAL 2013. Springer Berlin Heidelberg, 2013. 24-31.
2. Birattari, Mauro, et al. "A Racing Algorithm for Configuring Metaheuristics."GECCO. Vol. 2. 2002.
ubBalance
, ubOver
, ubUnder
, ubSMOTE
, ubOSS
, ubCNN
, ubENN
, ubNCL
, ubTomek
#use Racing to select the best technique for an unbalanced dataset library(unbalanced) data(ubIonosphere) #configure sampling parameters ubConf <- list(percOver=200, percUnder=200, k=2, perc=50, method="percPos", w=NULL) #load the classification algorithm that you intend to use inside the Race #see 'mlr' package for supported algorithms library(randomForest) #use only 5 trees results <- ubRacing(Class ~., ubIonosphere, "randomForest", positive=1, ubConf=ubConf, ntree=5) # try with 500 trees # results <- ubRacing(Class ~., ubIonosphere, "randomForest", positive=1, ubConf=ubConf, ntree=500) # let's try with a different algorithm # library(e1071) # results <- ubRacing(Class ~., ubIonosphere, "svm", positive=1, ubConf=ubConf) # library(rpart) # results <- ubRacing(Class ~., ubIonosphere, "rpart", positive=1, ubConf=ubConf)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.