GenerateLearningsets: Repeated Divisions into learn- and tets sets

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/GenerateLearningsets.r

Description

Due to very small sample sizes, the classical division learnset/testset does not give accurate information about the classification performance. Therefore, several different divisions should be used and aggregated. The implemented methods are discussed in Braga-Neto and Dougherty (2003) and Molinaro et al. (2005) whose terminology is adopted.

This function is usually the basis for all deeper analyses.

Usage

1
2
GenerateLearningsets(n, y, method = c("LOOCV", "CV", "MCCV", "bootstrap"),
                     fold = NULL, niter = NULL, ntrain = NULL, strat = FALSE)

Arguments

n

The total number of observations in the available data set. May be missing if y is provided instead.

y

A vector of class labels, either numeric or a factor. Must be given if strat=TRUE or n is not specified.

method

Which kind of scheme should be used to generate divisions into learning sets and test sets ? Can be one of the following:

"LOOCV"

Leaving-One-Out Cross Validation.

"CV"

(Ordinary) Cross-Validation. Note that fold must as well be specified.

"MCCV"

Monte-Carlo Cross Validation, i.e. random divisions into learning sets with ntrain(s.below) observations and tests sets with ntrain observations.

"bootstrap"

Learning sets are generated by drawing n times with replacement from all observations. Those not drawn not all form the test set.

fold

Gives the number of CV-groups. Used only when method="CV"

niter

Number of iterations (s.details).

ntrain

Number of observations in the learning sets. Used only when method="MCCV".

strat

Logical. Should stratified sampling be performed, i.e. the proportion of observations from each class in the learning sets be the same as in the whole data set ?

Does not apply for method = "LOOCV".

Details

Value

An object of class learningsets

Author(s)

Martin Slawski ms@cs.uni-sb.de

Anne-Laure Boulesteix boulesteix@ibe.med.uni-muenchen.de

Christoph Bernau bernau@ibe.med.uni-muenchen.de

References

Braga-Neto, U.M., Dougherty, E.R. (2003).

Is cross-validation valid for small-sample microarray classification ?

Bioinformatics, 20(3), 374-380

Molinaro, A.M., Simon, R., Pfeiffer, R.M. (2005).

Prediction error estimation: a comparison of resampling methods.

Bioinformatics, 21(15), 3301-3307

Slawski, M. Daumer, M. Boulesteix, A.-L. (2008) CMA - A comprehensive Bioconductor package for supervised classification with high dimensional data. BMC Bioinformatics 9: 439

See Also

learningsets, GeneSelection, tune, classification

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# LOOCV
loo <- GenerateLearningsets(n=40, method="LOOCV")
show(loo)
# five-fold-CV
CV5 <- GenerateLearningsets(n=40, method="CV", fold=5)
show(loo)
# MCCV
mccv <- GenerateLearningsets(n=40, method = "MCCV", niter=3, ntrain=30)
show(mccv)
# Bootstrap
boot <- GenerateLearningsets(n=40, method="bootstrap", niter=3)
# stratified five-fold-CV
set.seed(113)
classlabels <- sample(1:3, size = 50, replace = TRUE, prob = c(0.3, 0.5, 0.2))
CV5strat <- GenerateLearningsets(y = classlabels, method="CV", fold=5, strat = TRUE)
show(CV5strat)

chbernau/CMA documentation built on May 17, 2019, 12:04 p.m.