ubRacing: Racing

View source: R/ubRacing.R

ubRacingR Documentation

Racing

Description

The function implementes the Racing algorithm [2] for selecting the best technique to re-balance or remove noisy instances in unbalanced datasets [1].

Usage

ubRacing(formula, data, algo, positive=1, ncore=1, nFold=10, maxFold=10, maxExp=100, 
          stat.test="friedman", metric="f1", ubConf, threshold=NULL, verbose=FALSE, ...)

Arguments

formula

formula describing the model to be fitted.

data

the unbalanced dataset

algo

the classification algorithm to use with the mlr package.

positive

label of the positive (minority) class.

ncore

the number of core to use in the Race. Race is performed with parallel exectuion when ncore > 1.

nFold

number of folds in the cross-validation that provides the subset of data to the Race

maxFold

maximum number of folds to use in the Race

maxExp

maximum number of experiments to use in the Race

stat.test

statistical test to use to remove candidates which perform significantly worse than the best.

metric

metric used to asses the classification (f1, auc or gmean).

ubConf

configuration of the balancing techniques used in the Race.

threshold

threshold used to classify instances. If NULL use default values by mlr package.

verbose

print extra information (TRUE/FALSE)

...

additional arguments pass to train function in mlr package.

Details

The argument metric can take the following values: "gmean", "f1" (F-score or F-measure), "auc" (Area Under ROC curve). Argument stat.test defines the statistical test used to remove candidates during the race. It can take the following values: "friedman" (Friedman test), "t.bonferroni" (t-test with bonferroni correction), "t.holm" (t-test with holm correction), "t.none" (t-test without correction), "no" (no test, the Race continues until new subsets of data are provided by the cross validation). Argument ubConf is a list passed to function ubBalance that is used for configuration.

Value

The function returns a list:

Race

matrix containing accuracy results for each technique in the Race.

best

best technique selected in the Race.

avg

average of the metric used in the Race for the technique selected.

sd

standard deviation of the metric used in the Race for the technique selected.

N.test

number of experiments used in the Race.

Gain

% of computational gain with resepct to the maximum number of experiments given by the cross validation.

Note

The function ubRacing is a modified version of the race function availble in the race package: http://cran.r-project.org/package=race.

References

1. Dal Pozzolo, Andrea, et al. "Racing for unbalanced methods selection." Intelligent Data Engineering and Automated Learning - IDEAL 2013. Springer Berlin Heidelberg, 2013. 24-31.
2. Birattari, Mauro, et al. "A Racing Algorithm for Configuring Metaheuristics."GECCO. Vol. 2. 2002.

See Also

ubBalance, ubOver, ubUnder, ubSMOTE, ubOSS, ubCNN, ubENN, ubNCL, ubTomek

Examples

#use Racing to select the best technique for an unbalanced dataset
library(unbalanced)
data(ubIonosphere)

#configure sampling parameters
ubConf <- list(percOver=200, percUnder=200, k=2, perc=50, method="percPos", w=NULL)

#load the classification algorithm that you intend to use inside the Race
#see 'mlr' package for supported algorithms
library(randomForest)
#use only 5 trees
results <- ubRacing(Class ~., ubIonosphere, "randomForest", positive=1, ubConf=ubConf, ntree=5)

# try with 500 trees
# results <- ubRacing(Class ~., ubIonosphere, "randomForest", positive=1, ubConf=ubConf, ntree=500)
# let's try with a different algorithm
# library(e1071)
# results <- ubRacing(Class ~., ubIonosphere, "svm", positive=1, ubConf=ubConf)
# library(rpart)
# results <- ubRacing(Class ~., ubIonosphere, "rpart", positive=1, ubConf=ubConf)

dalpozz/unbalanced documentation built on June 3, 2022, 2:42 a.m.