GenerateLearningsets: Repeated Divisions into learn- and tets sets
In CMA: Synthesis of microarray-based classification

Description Usage Arguments Details Value Author(s) References See Also Examples

Due to very small sample sizes, the classical division learnset/testset does not give accurate information about the classification performance. Therefore, several different divisions should be used and aggregated. The implemented methods are discussed in Braga-Neto and Dougherty (2003) and Molinaro et al. (2005) whose terminology is adopted.

This function is usually the basis for all deeper analyses.

1 2	GenerateLearningsets(n, y, method = c("LOOCV", "CV", "MCCV", "bootstrap"), fold = NULL, niter = NULL, ntrain = NULL, strat = FALSE)

`n`	The total number of observations in the available data set. May be `missing` if `y` is provided instead.
`y`	A vector of class labels, either `numeric` or a `factor`. Must be given if `strat=TRUE` or `n` is not specified.
`method`	Which kind of scheme should be used to generate divisions into learning sets and test sets ? Can be one of the following: "LOOCV" Leaving-One-Out Cross Validation. "CV" (Ordinary) Cross-Validation. Note that `fold` must as well be specified. "MCCV" Monte-Carlo Cross Validation, i.e. random divisions into learning sets with `ntrain`(s.below) observations and tests sets with `ntrain` observations. "bootstrap" Learning sets are generated by drawing `n` times with replacement from all observations. Those not drawn not all form the test set.
`fold`	Gives the number of CV-groups. Used only when `method="CV"`
`niter`	Number of iterations (s.`details).`
`ntrain`	Number of observations in the learning sets. Used only when `method="MCCV"`.
`strat`	Logical. Should stratified sampling be performed, i.e. the proportion of observations from each class in the learning sets be the same as in the whole data set ? Does not apply for `method = "LOOCV"`.

When method="CV", niter gives the number of times the whole CV-procedure is repeated. The output matrix has then foldxniter rows. When method="MCCV" or method="bootstrap", niter is simply the number of considered learning sets.
Note that method="CV",fold=n is equivalent to method="LOOCV".

An object of class learningsets

Martin Slawski ms@cs.uni-sb.de

Anne-Laure Boulesteix boulesteix@ibe.med.uni-muenchen.de

Christoph Bernau bernau@ibe.med.uni-muenchen.de

Braga-Neto, U.M., Dougherty, E.R. (2003).

Is cross-validation valid for small-sample microarray classification ?

Bioinformatics, 20(3), 374-380

Molinaro, A.M., Simon, R., Pfeiffer, R.M. (2005).

Prediction error estimation: a comparison of resampling methods.

Bioinformatics, 21(15), 3301-3307

Slawski, M. Daumer, M. Boulesteix, A.-L. (2008) CMA - A comprehensive Bioconductor package for supervised classification with high dimensional data. BMC Bioinformatics 9: 439

learningsets, GeneSelection, tune, classification

# LOOCV
loo <- GenerateLearningsets(n=40, method="LOOCV")
show(loo)
# five-fold-CV
CV5 <- GenerateLearningsets(n=40, method="CV", fold=5)
show(loo)
# MCCV
mccv <- GenerateLearningsets(n=40, method = "MCCV", niter=3, ntrain=30)
show(mccv)
# Bootstrap
boot <- GenerateLearningsets(n=40, method="bootstrap", niter=3)
# stratified five-fold-CV
set.seed(113)
classlabels <- sample(1:3, size = 50, replace = TRUE, prob = c(0.3, 0.5, 0.2))
CV5strat <- GenerateLearningsets(y = classlabels, method="CV", fold=5, strat = TRUE)
show(CV5strat)

Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, basename, cbind, colnames,
    dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which.max, which.min

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

learningset mode:  LOOCV 
number of learningsets:  40 
(maximum) number of observations per learning set:  39 
learningset mode:  LOOCV 
number of learningsets:  40 
(maximum) number of observations per learning set:  39 
learningset mode:  MCCV 
number of learningsets:  3 
(maximum) number of observations per learning set:  30 
learningset mode:  stratified CV 
number of learningsets:  5 
(maximum) number of observations per learning set:  41