ubRacing: Racing
In dalpozz/unbalanced: Racing for Unbalanced Methods Selection

View source: R/ubRacing.R

ubRacing

R Documentation

Racing

Description

The function implementes the Racing algorithm [2] for selecting the best technique to re-balance or remove noisy instances in unbalanced datasets [1].

Usage

ubRacing(formula, data, algo, positive=1, ncore=1, nFold=10, maxFold=10, maxExp=100, 
          stat.test="friedman", metric="f1", ubConf, threshold=NULL, verbose=FALSE, ...)

Arguments

`formula`	formula describing the model to be fitted.
`data`	the unbalanced dataset
`algo`	the classification algorithm to use with the mlr package.
`positive`	label of the positive (minority) class.
`ncore`	the number of core to use in the Race. Race is performed with parallel exectuion when ncore > 1.
`nFold`	number of folds in the cross-validation that provides the subset of data to the Race
`maxFold`	maximum number of folds to use in the Race
`maxExp`	maximum number of experiments to use in the Race
`stat.test`	statistical test to use to remove candidates which perform significantly worse than the best.
`metric`	metric used to asses the classification (f1, auc or gmean).
`ubConf`	configuration of the balancing techniques used in the Race.
`threshold`	threshold used to classify instances. If NULL use default values by mlr package.
`verbose`	print extra information (TRUE/FALSE)
`...`	additional arguments pass to train function in mlr package.

Details

The argument metric can take the following values: "gmean", "f1" (F-score or F-measure), "auc" (Area Under ROC curve). Argument stat.test defines the statistical test used to remove candidates during the race. It can take the following values: "friedman" (Friedman test), "t.bonferroni" (t-test with bonferroni correction), "t.holm" (t-test with holm correction), "t.none" (t-test without correction), "no" (no test, the Race continues until new subsets of data are provided by the cross validation). Argument ubConf is a list passed to function ubBalance that is used for configuration.

Value

The function returns a list:

`Race`	matrix containing accuracy results for each technique in the Race.
`best`	best technique selected in the Race.
`avg`	average of the metric used in the Race for the technique selected.
`sd`	standard deviation of the metric used in the Race for the technique selected.
`N.test`	number of experiments used in the Race.
`Gain`	% of computational gain with resepct to the maximum number of experiments given by the cross validation.

Note

The function ubRacing is a modified version of the race function availble in the race package: http://cran.r-project.org/package=race.

References

1. Dal Pozzolo, Andrea, et al. "Racing for unbalanced methods selection." Intelligent Data Engineering and Automated Learning - IDEAL 2013. Springer Berlin Heidelberg, 2013. 24-31.
2. Birattari, Mauro, et al. "A Racing Algorithm for Configuring Metaheuristics."GECCO. Vol. 2. 2002.

Examples

#use Racing to select the best technique for an unbalanced dataset
library(unbalanced)
data(ubIonosphere)

#configure sampling parameters
ubConf <- list(percOver=200, percUnder=200, k=2, perc=50, method="percPos", w=NULL)

#load the classification algorithm that you intend to use inside the Race
#see 'mlr' package for supported algorithms
library(randomForest)
#use only 5 trees
results <- ubRacing(Class ~., ubIonosphere, "randomForest", positive=1, ubConf=ubConf, ntree=5)

# try with 500 trees
# results <- ubRacing(Class ~., ubIonosphere, "randomForest", positive=1, ubConf=ubConf, ntree=500)
# let's try with a different algorithm
# library(e1071)
# results <- ubRacing(Class ~., ubIonosphere, "svm", positive=1, ubConf=ubConf)
# library(rpart)
# results <- ubRacing(Class ~., ubIonosphere, "rpart", positive=1, ubConf=ubConf)

dalpozz/unbalanced documentation built on June 3, 2022, 2:42 a.m.