Description Usage Arguments Details Value References See Also Examples
Feature selection and class prediction in a multiple random validation protocol. Misclassifications rates are calculated for different sizes of the training set.
1 |
eset |
Bioconductor ExpressionSet |
class |
Specification of the column in |
ngenes |
Numerical vector specifying the numbers features that are used for classification. |
dist |
Character string specifiying the method for calculation of the distance between test samples and the centroids. Possible values are "euclidean", "angle", "cor", "center". |
method |
Character string specifying the feature selection method. Possible values are "cor", "student.test", "welch.test", "wilcoxon.test", "foldchange", "copa", "os", "ort", "shift", "throw". |
ntrain |
One of the strings "balanced" or "prevalence" or a numeric matrix that contains the numbers of training samples of the first class in the in first row and the numbers of training samples of the second class in the second row. |
nrep |
The number of repeated training-test splits for each training set size. |
hparam |
Hyperparameter needed for some of the feature selection methods. For methods copa, ors and os: Quantile (e.g. 0.75, 0.9, 0.95) used in order to detect outliers. For methods shift and throw: the minimum number of samples in each class after applying shift or throw. |
The matrix exprs(eset)
contains the expression signatures of the patients in the columns.
The character vector pData(eset)[[class]]
contains the class membership of each sample or patient. Only two-class problems are supported.
The hyperparameter hparam
describes the minimum number of samples in each class after applying shift/throw.
For copa
, ort
and os
the hyperparameter specifies the quantile that has to be exceeded in order to consider a sample as an outlier. Typical values are 0.75 (default), 0.9, 0.95.
Validation is implemented in a multiple random validation protocol [1]. For each training set size, nrep
training sets are randomly drawn from the patients. Features are selected and the centroid is calculated for each of the two classes in feature space. The test samples are classified to the class with the nearest centroid.
Four methods are available for calculation of the distance between test samples and the centroids: euclidean distance, centered euclidean distance, angle and Pearson correlation. Calculation of distances is executed using the internal function get.d
.
The parameter ntrain
should be equal to one of the strings "balanced" or "prevalence" or a numeric matrix with two rows. For ntrain = "balanced"
, a balanced layout is used, i.e. half of the training set is chosen from each of the two classes. For ntrain = "prevalence"
the training sets are balanced according to the prevalence of the two classes in the entire data set. Further, the user can manually specify the sizes of the training sets.
A validation
object, see validation.object
for details.
Objects of this class have a method for the function plot
.
[1] Michiels S, Koscielny S, Hill C (2005), Prediction of cancer outcome with microarrays: a multiple random validation strategy, Lancet 365:488-92.
1 | ### see: help(GOLUB);
|
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, sd, var, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, basename, cbind, colMeans, colSums, colnames,
dirname, do.call, duplicated, eval, evalq, get, grep, grepl,
intersect, is.unsorted, lapply, lengths, mapply, match, mget,
order, paste, pmax, pmax.int, pmin, pmin.int, rank, rbind,
rowMeans, rowSums, rownames, sapply, setdiff, sort, table, tapply,
union, unique, unsplit, which, which.max, which.min
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Loading required package: binom
Attaching package: 'cancerclass'
The following objects are masked from 'package:stats':
filter, predict
The following object is masked from 'package:graphics':
plot
The following object is masked from 'package:base':
summary
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.