classification: General method for classification with various methods
In CMA: Synthesis of microarray-based classification

Description Usage Arguments Details Value Author(s) References See Also Examples

Most general function in the package, providing an interface to perform variable selection, hyperparameter tuning and classification in one step. Alternatively, the first two steps can be performed separately and can then be plugged into this function.
For S4 method information, s. classification-methods.

1	classification(X, y, f, learningsets, genesel, genesellist = list(), nbgene, classifier, tuneres, tuninglist = list(), trace = TRUE, models=FALSE,...)

`X`	Gene expression data. Can be one of the following: A `matrix`. Rows correspond to observations, columns to variables. A `data.frame`, when `f` is not missing (s. below). An object of class `ExpressionSet`.
`y`	Class labels. Can be one of the following: A `numeric` vector. A `factor`. A `character` if `X` is an `ExpressionSet` that specifies the phenotype variable. `missing`, if `X` is a `data.frame` and a proper formula `f` is provided. WARNING: The class labels will be re-coded to range from `0` to `K-1`, where `K` is the total number of different classes in the learning set.
`f`	A two-sided formula, if `X` is a `data.frame`. The left part correspond to class labels, the right to variables.
`learningsets`	An object of class `learningsets`. May be missing, then the complete datasets is used as learning set.
`genesel`	Optional (but usually recommended) object of class `genesel` containing variable importance information for the argument `learningsets`
`genesellist`	In the case that the argument `genesel` is missing, this is an argument list passed to `GeneSelection`. If both `genesel` and `genesellist` are missing, no variable selection is performed.
`nbgene`	Number of best genes to be kept for classification, based on either `genesel` or the call to `GeneSelection` using `genesellist`. In the case that both are missing, this argument is not necessary. note: If the gene selection method has been one of `"lasso", "elasticnet", "boosting"`, `nbgene` will be reset to `min(s, nbgene)` where `s` is the number of nonzero coefficients. if the gene selection scheme has been `"one-vs-all", "pairwise"` for the multiclass case, there exist several rankings. The top `nbgene` will be kept of each of them, so the number of effective used genes will sometimes be much larger.
`classifier`	Name of function ending with `CMA` indicating the classifier to be used.
`tuneres`	Analogous to the argument `genesel` - object of class `tuningresult` containing information about the best hyperparameter choice for the argument `learningsets`.
`tuninglist`	Analogous to the argument `genesellist`. In the case that the argument `tuneres` is missing, this in argument list passed to `tune`. If both `tuneres` and `tuninglist` are missing, no variable selection is performed. warning: Note that if a user-defined hyperparameter grid is passed, this will result in a list within a list: `tuninglist = list(grids=list(argname = c())`, s. example. warning: Contrary to `tune`, if `tuninglist` is an empty list (default), no hyperparameter tuning will be performed at all. To use pre-defined hyperparameter grids, the argument is `tuninglist = list(grids = list())`.
`trace`	Should progress be traced ? Default is `TRUE`.
`models`	a logical value indicating whether the model object shall be returned
`...`	Further arguments passed to the function `classifier`.

For details about hyperparameter tuning, consult tune.

A list of objects of class cloutput and clvarseloutput, respectively; its length equals the number of different learningsets. The single elements of the list can convenienly be combined using the join function. The results can be analyzed and evaluated by various measures using the method evaluation.

Martin Slawski ms@cs.uni-sb.de

Anne-Laure Boulesteix boulesteix@ibe.med.uni-muenchen.de

Christoph Bernau bernau@ibe.med.uni-muenchen.de

Slawski, M. Daumer, M. Boulesteix, A.-L. (2008) CMA - A comprehensive Bioconductor package for supervised classification with high dimensional data. BMC Bioinformatics 9: 439

GeneSelection, tune, evaluation, compBoostCMA, dldaCMA, ElasticNetCMA, fdaCMA, flexdaCMA, gbmCMA, knnCMA, ldaCMA, LassoCMA, nnetCMA, pknnCMA, plrCMA, pls_ldaCMA, pls_lrCMA, pls_rfCMA, pnnCMA, qdaCMA, rfCMA, scdaCMA, shrinkldaCMA, svmCMA

### a simple k-nearest neighbour example
### datasets
## Not run: plot(x)
data(golub)
golubY <- golub[,1]
golubX <- as.matrix(golub[,-1])
### learningsets
set.seed(111)
lset <- GenerateLearningsets(y=golubY, method = "CV", fold=5, strat =TRUE)
### 1. GeneSelection
selttest <- GeneSelection(golubX, golubY, learningsets = lset, method = "t.test")
### 2. tuning
tunek <- tune(golubX, golubY, learningsets = lset, genesel = selttest, nbgene = 20, classifier = knnCMA)
### 3. classification
knn1 <- classification(golubX, golubY, learningsets = lset, genesel = selttest,
                       tuneres = tunek, nbgene = 20, classifier = knnCMA)
### steps 1.-3. combined into one step:
knn2 <- classification(golubX, golubY, learningsets = lset,
                       genesellist = list(method  = "t.test"), classifier = knnCMA,
                       tuninglist = list(grids = list(k = c(1:8))), nbgene = 20)
### show and analyze results:
knnjoin <- join(knn2)
show(knn2)
eval <- evaluation(knn2, measure = "misclassification")
show(eval)
summary(eval)
boxplot(eval)

## End(Not run)