mc.comp.1: Multiple Classifier Predictions Comparison

Description Usage Arguments Value Note Author(s) References See Also Examples

View source: R/mcfct.r

Description

Function to test for significant differences between predictions made by various classifiers (method and/or settings) built on the same partitioning schema. This function requires that explicit class predictions for each fold and iteration are contained in the classifier object of the type of accest. For each pairwise comparison: mean of the differences, variance associated, student t-statistics and corresponding p value are reported in a table. Subsequent multiple testing correction is applied if more than two classifiers are involved. Note that column DisId is used to sort the classifiers according to the discrimination task and DisId and AlgId will be used to report the results. Of course, it is also assumed that partitioning for models built with two different classifiers is identical.

Usage

1
2
3
4
mc.comp.1(mc.obj,lmod=NULL,p.adjust.method="holm")

## Default S3 method:
mc.comp.1(mc.obj,lmod=NULL,p.adjust.method="holm")

Arguments

mc.obj

mc.agg object - See details mc.agg

lmod

List of models to be considered - Default: all of them

p.adjust.method

Multiple testing correction. See details in p.adjust

Value

mc.comp.1 object:

res

Summary of classifier pairwise comparisons for each discrimination task

cltask

Discrimination task(s).

title

Title for printing function.

Note

See publications mentioned below.

Author(s)

David Enot [email protected]

References

Berrar, D., Bradbury, I. and Dubitzky, W. (2006). Avoiding model selection bias in small-sample genomic datasets. Bioinformatics. Vol.22, No.10, 1245-125.

Bouckaert, R.R.,and Frank, E., (2004). Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms. Proc 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Vol.3054, 3-12

See Also

mc.agg

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
data(iris)
x <- as.matrix(subset(iris, select = -Species))
y <- iris$Species
pars   <- valipars(sampling = "cv",niter = 10, nreps=5, strat=TRUE)
tr.idx <- trainind(y,pars=pars)
## RF model based one tree
acc1   <- accest(x, y, clmeth ="randomForest", pars = pars, tr.idx=tr.idx,ntree=1)
## RF model based 100 trees
acc2   <- accest(x, y, clmeth = "randomForest", pars = pars, tr.idx=tr.idx,ntree=100)
### RF model where the minimum size of terminal nodes is set to a value greater 
## than the maximum number of samples per class (oups!)
acc3   <- accest(x,y, clmeth = "randomForest", pars = pars, tr.idx=tr.idx,ntree=1,nodesize=80)

clas=mc.agg(acc1,acc2,acc3)
res.comp<-mc.comp.1(clas,p.adjust.method="holm")

## No significant differences between 1 and 2
## Of course classifiers 1 and 2 performs significantly better than 3
## by default
res.comp

## with a few more decimals...
print(res.comp,digits=4)

## Print results in a file
## Not run: print(res.comp,digits=2,file="tmp.csv")

wilsontom/FIEmspro documentation built on Feb. 19, 2018, 9:03 a.m.