Estimates optimal number of biomarkers at a given error tolerance level for various classification rules

Description

Using interactive control panel (see rpanel) and 3D real-time rendering system (rgl), this package provides a user friendly GUI for estimating the minimum number of biomarkers (variables) needed to achieve a given level of accuracy for two-group classification problems based on microarray data.

Usage

1
2
3
4
5
6
7
optimiseBiomarker (error,
                   errorTol = 0.05,
                   method = "RF", nTrain = 100, 
                   sdB = 1.5,
                   sdW = 1,
                   foldAvg = 2.88,
                   nRep = 3) 

Arguments

error

The database of classification errors. See errorDbase for details.

errorTol

Error tolerance limit.

method

Classification method. Can be one of "RF", "SVM", and "KNN" for Random Forest, Support Vector Machines, Linear Discriminant Analysis and k-Nearest Neighbour respectively.

nTrain

Training set size, i.e., the total number of biological samples in group 1 and group 2.

sdB

Biological variation (σ_b) of data in log (base 2) scale.

sdW

Experimental (technical) variation (σ_e) of data in log (base 2) scale.

foldAvg

Average fold change of the biomarkers.

nRep

Number of technical replications.

Details

The function optimiseBiomarker is a user friendly GUI for interrogating the database of leave-one-out cross-validation errors, errorDbase, to estimate optimal number of biomarkers for microarray based classifications. The database is built on the basis of simulated data using the classificationError function. The function simData is used for simulating microarray data for various combinations of factors such as the number of biomarkers, training set size, biological variation, experimental variation, fold change, replication, and correlation.

Author(s)

Mizanur Khondoker, Till Bachmann, Peter Ghazal

Maintainer: Mizanur Khondoker mizanur.khondoker@gmail.com.

References

Khondoker, M. R., Till T. Bachmann, T. T., Mewissen, M., Dickinson, P. et al.(2010). Multi-factorial analysis of class prediction error: estimating optimal number of biomarkers for various classification rules. Journal of Bioinformatics and Computational Biology, 8, 945-965.

Breiman, L. (2001). Random Forests, Machine Learning 45(1), 5–32.

Chang, Chih-Chung and Lin, Chih-Jen: LIBSVM: a library for Support Vector Machines, http://www.csie.ntu.edu.tw/~cjlin/libsvm.

Ripley, B. D. (1996). Pattern Recognition and Neural Networks.Cambridge: Cambridge University Press.

Efron, B. and Tibshirani, R. (1997). Improvements on Cross-Validation: The .632+ Bootstrap Estimator. Journal of the American Statistical Association 92(438), 548–560.

Bowman, A., Crawford, E., Alexander, G. and Bowman, R. W. (2007). rpanel: Simple interactive controls for R functions using the tcltk package. Journal of Statistical Software 17(9).

See Also

simData classificationError

Examples

1
2
3
4