Description Usage Arguments Value Note Author(s) References See Also Examples
For different learning data sets as defined by the argument learningsets
,
this method ranks the genes from the most relevant to the less relevant using
one of various 'filter' criteria or provides a sparse collection of variables
(Lasso, ElasticNet, Boosting). The results are typically used for variable selection for
the classification procedure that follows.
For S4 class information, s. GeneSelection-methods
.
1 | GeneSelection(X, y, f, learningsets, method = c("t.test", "welch.test", "wilcox.test", "f.test", "kruskal.test", "limma", "rfe", "rf", "lasso", "elasticnet", "boosting", "golub", "shrinkcat"), scheme, trace = TRUE, ...)
|
X |
Gene expression data. Can be one of the following:
|
y |
Class labels. Can be one of the following:
|
f |
A two-sided formula, if |
learningsets |
An object of class |
method |
A character specifying the method to be used:
|
scheme |
The scheme to be used in the case of a non-binary response. Must be one
of |
trace |
Should the progress be traced ? Default is |
... |
Further arguments passed to the function performing variable selection, s. |
An object of class genesel
.
most of the methods described above are only apt for the binary classification case. The only ones that can be used without restriction in the multiclass case are
f.test
kruskal.test
rf
boosting
For the rest, pairwise or one-vs-all schemes are used.
Martin Slawski ms@cs.uni-sb.de
Anne-Laure Boulesteix boulesteix@ibe.med.uni-muenchen.de
Christoph Bernau bernau@ibe.med.uni-muenchen.de
Smyth, G. K., Yang, Y.-H., Speed, T. P. (2003).
Statistical issues in microarray data analysis.
Methods in Molecular Biology 224, 111-136.
Guyon, I., Weston, J., Barnhill, S., Vapnik, V. (2002).
Gene Selection for Cancer Classification using support
vector machines.
Journal of Machine Learning Research, 46, 389-422
Zhou, H., Hastie, T. (2004).
Regularization and variable selection via the elastic net.
Journal of the Royal Statistical Society B, 67(2),301-320
Buelmann, P., Yu, B. (2003).
Boosting with the L2 loss: Regression and Classification.
Journal of the American Statistical Association, 98, 324-339
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R. (2004).
Least Angle Regression.
Annals of Statistics, 32:407-499
Buehlmann, P., Yu, B. (2006).
Sparse Boosting.
Journal of Machine Learning Research, 7- 1001:1024
Slawski, M. Daumer, M. Boulesteix, A.-L. (2008) CMA - A comprehensive Bioconductor package for supervised classification with high dimensional data. BMC Bioinformatics 9: 439
filter
, GenerateLearningsets
, tune
,
classification
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | # load Golub AML/ALL data
data(golub)
### extract class labels
golubY <- golub[,1]
### extract gene expression from first 10 genes
golubX <- as.matrix(golub[,-1])
### Generate five different learningsets
set.seed(111)
five <- GenerateLearningsets(y=golubY, method = "CV", fold = 5, strat = TRUE)
### simple t-test:
selttest <- GeneSelection(golubX, golubY, learningsets = five, method = "t.test")
### show result:
show(selttest)
toplist(selttest, k = 10, iter = 1)
plot(selttest, iter = 1)
|
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, sd, var, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, basename, cbind, colMeans, colSums, colnames,
dirname, do.call, duplicated, eval, evalq, get, grep, grepl,
intersect, is.unsorted, lapply, lengths, mapply, match, mget,
order, paste, pmax, pmax.int, pmin, pmin.int, rank, rbind,
rowMeans, rowSums, rownames, sapply, setdiff, sort, table, tapply,
union, unique, unsplit, which, which.max, which.min
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
GeneSelection: iteration 1
GeneSelection: iteration 2
GeneSelection: iteration 3
GeneSelection: iteration 4
GeneSelection: iteration 5
gene selection performed with 't.test'
scheme used :'pairwise'
number of genes: 3051
number of different learningsets: 5
top 10 genes for iteration 1
index importance
1 829 9.195902
2 2670 8.502019
3 378 8.178456
4 1009 7.792680
5 2124 7.695132
6 896 7.659007
7 515 7.522447
8 808 7.163428
9 1448 7.013725
10 394 6.937388
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.