testclass: Evaluating a classification method based on several learning...
In MAclinical: Class prediction based on microarray data and clinical parameters

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/testclass.r

This function evaluates classifiers built using microarray data and/or clinical predictors, based on several pairs of learning and test data sets.

1 2	testclass(x=NULL,y,z=NULL,learningsets,classifier,ncomp=0:3, varsel=NULL,nbgene=NULL,fold=10,...)

`x`	A n x p matrix giving the gene expression levels of p genes (columns) for n patients (rows).
`y`	A numeric vector of length n giving the class membership of the n patients, coded as 0,...,K-1 (where K is the number of classes).
`z`	A n x q data frame giving the q clinical predictors for the n patients. Nominal variables should be given as factors, variables with an at least ordinal scale should be given as numeric.
`learningsets`	A matrix with niter rows giving the indices of the arrays to be included in the learning sets for the niter iterations, as generated by the function `generate.learningsets`. The i-th row gives the indices of the arrays included in the learning set for the i-th iteration. For instance, in LOOCV, the i-th row of the matrix `learningsets` contains all the integers from 1 to n except i. Note that an observation may be included twice or more in the same learning set (for instance in bootstrap sampling).
`classifier`	The function used to construct a classifier. The function must have the same structure as `plsrf_xz_pv`.
`ncomp`	The candidate numbers of PLS components (if PLS dimension reduction is used).
`varsel`	A niter x p matrix giving the indices of the genes ordered by the chosen gene selection criterion. For example, the element in the first row and the first column is the index of the gene that is ranked best using the first learning set.
`nbgene`	The number of genes to use for classifier construction. Default is `nbgene=NULL`, corresponding to all genes.
`fold`	The number of folds for the pre-validation step. See Boulesteix et al (2008) for more details. Default is `fold=10`.
`...`	Other arguments to be passed to the function `cforest_control` from the `party` package or to the function `svm` from the package `e1071`, depending on the specified classifier.

For an overview of different methods used to generate the learning sets defined by generate.learningsets, see Boulesteix et al (2007). These methods include (repeated) cross-validation, subsampling, bootstrap sampling.

`error`	A numeric vector of length `niter` giving the misclassification rate for each iteration.
`bestncomp`	A numeric vector of length `niter` giving the best number of (pre-validated) PLS components, as obtained using the model selection method based on the out-of-bag error by Boulesteix et al (returned only for the classifiers `plsrf_xz_pv`, `plsrf_xz`, `plsrf_x_pv`, `plsrf_x`).
`OOB`	A list of length `niter`, whose elements are numeric vectors of the same length as `ncomp` giving the out-of-bag error of the forest constructed with the corresponding number of (pre-validated) PLS components (returned only for the classifiers `plsrf_xz_pv`, `plsrf_xz`, `plsrf_x_pv`, `plsrf_x`, `rf_z`. For `rf_z`, no model selection is performed: `OOB` is just the out-of-bag error of the constructed forest.)

Anne-Laure Boulesteix (http://www.ibe.med.uni-muenchen.de/organisation/mitarbeiter/020_professuren/boulesteix/eng.html)

Boulesteix AL, Porzelius C, Daumer M, 2008. Microarray-based classification and clinical predictors: On combined classifiers and additional predictive value. Bioinformatics 24:1698-1706.

Boulesteix AL, Strobl C, Augustin T, Daumer D, 2008. Evaluating microarray-based classifiers: an overview. Cancer Informatics 6:77-97.

testclass_simul, simulate, generate.learningsets, plsrf_xz_pv, plsrf_x_pv, plsrf_xz, plsrf_x, rf_z, svm_x, logistic_z.

# load MAclinical library
# library(MAclinical)

# Generate data
x<-matrix(rnorm(20000),100,200)
z<-matrix(rnorm(500),100,5)
y<-sample(0:1,100,replace=TRUE)

# Generate learningsets (5-fold CV)
my.learningsets<-generate.learningsets(n=100,method="CV",fold=5)

# Evaluate accuracy of the PLS-PV-RF method
my.eval<-testclass(x=x,y=y,z=z,learningsets=my.learningsets,classifier=plsrf_xz_pv,ncomp=5,
varsel=NULL,nbgene=NULL,fold=10)

# With variable selection
my.varsel<-matrix(0,5,200)
for (i in 1:5)
 {
 my.varsel[i,]<-order(abs(studentt.stat(X=x[my.learningsets[i,],],
 L=y[my.learningsets[i,]]+1)),decreasing=TRUE)
 }

my.eval<-testclass(x=x,y=y,z=z,learningsets=my.learningsets,classifier=plsrf_xz_pv,ncomp=5,
varsel=my.varsel,nbgene=15,fold=10)