Description Usage Arguments Details Value Author(s) References See Also Examples
View source: R/GenerateLearningsets.r
Due to very small sample sizes, the classical division learnset/testset does not give accurate information about the classification performance. Therefore, several different divisions should be used and aggregated. The implemented methods are discussed in Braga-Neto and Dougherty (2003) and Molinaro et al. (2005) whose terminology is adopted.
This function is usually the basis for all deeper analyses.
1 2 |
n |
The total number of observations in the available data set. May be |
y |
A vector of class labels, either |
method |
Which kind of scheme should be used to generate divisions into learning sets and test sets ? Can be one of the following:
|
fold |
Gives the number of CV-groups. Used only when |
niter |
Number of iterations (s. |
ntrain |
Number of observations in the learning sets. Used
only when |
strat |
Logical. Should stratified sampling be performed, i.e. the proportion of observations from each class in the learning sets be the same as in the whole data set ? Does not apply for |
When method="CV"
, niter
gives the number of times
the whole CV-procedure is repeated. The output matrix has then fold
xniter
rows.
When method="MCCV"
or method="bootstrap"
, niter
is simply the number of considered
learning sets.
Note that method="CV",fold=n
is equivalent to method="LOOCV"
.
An object of class learningsets
Martin Slawski ms@cs.uni-sb.de
Anne-Laure Boulesteix boulesteix@ibe.med.uni-muenchen.de
Christoph Bernau bernau@ibe.med.uni-muenchen.de
Braga-Neto, U.M., Dougherty, E.R. (2003).
Is cross-validation valid for small-sample microarray classification ?
Bioinformatics, 20(3), 374-380
Molinaro, A.M., Simon, R., Pfeiffer, R.M. (2005).
Prediction error estimation: a comparison of resampling methods.
Bioinformatics, 21(15), 3301-3307
Slawski, M. Daumer, M. Boulesteix, A.-L. (2008) CMA - A comprehensive Bioconductor package for supervised classification with high dimensional data. BMC Bioinformatics 9: 439
learningsets
, GeneSelection
, tune
,
classification
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | # LOOCV
loo <- GenerateLearningsets(n=40, method="LOOCV")
show(loo)
# five-fold-CV
CV5 <- GenerateLearningsets(n=40, method="CV", fold=5)
show(loo)
# MCCV
mccv <- GenerateLearningsets(n=40, method = "MCCV", niter=3, ntrain=30)
show(mccv)
# Bootstrap
boot <- GenerateLearningsets(n=40, method="bootstrap", niter=3)
# stratified five-fold-CV
set.seed(113)
classlabels <- sample(1:3, size = 50, replace = TRUE, prob = c(0.3, 0.5, 0.2))
CV5strat <- GenerateLearningsets(y = classlabels, method="CV", fold=5, strat = TRUE)
show(CV5strat)
|
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: ‘BiocGenerics’
The following objects are masked from ‘package:parallel’:
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from ‘package:stats’:
IQR, mad, sd, var, xtabs
The following objects are masked from ‘package:base’:
anyDuplicated, append, as.data.frame, basename, cbind, colnames,
dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
union, unique, unsplit, which.max, which.min
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
learningset mode: LOOCV
number of learningsets: 40
(maximum) number of observations per learning set: 39
learningset mode: LOOCV
number of learningsets: 40
(maximum) number of observations per learning set: 39
learningset mode: MCCV
number of learningsets: 3
(maximum) number of observations per learning set: 30
learningset mode: stratified CV
number of learningsets: 5
(maximum) number of observations per learning set: 41
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.