directClass: A function to build and evaluate 10 different classification...

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

This function uses either simulated (train and test) or real-life gene expression data to build and evaluate binary direct classifiers with 10 different classification functions [LDA, KNN, NNET, PAM, QDA, RF, Ridge (PLR2), Lasso (PLR1), Elastic Net (PLR12) and SVM] by minimizing the misclassification error rates.

Usage

1
directClass(data, dataY = NULL, simulated = TRUE, fold = 5)

Arguments

data

an object returned by generateGED or a matrix of expression values containing genes in the rows and samples in the columns.

dataY

an optional vector of class lables that can be coerced to a factor. Must be supplied when the data argument is not simulated data.

simulated

logical, indicating whether the data supplied is simulated data (Default is TRUE).

fold

the number of cross-validation times to divide the real-life data into 2/3 train and 1/3 test with stratisfication(Default is 5). For meaningful comparison we recommend fold=100.

Details

See reference for detail on which classification functions and/or parameters are optimized in this function and how the classifiers are built and evaluated. Please note that for large datasets and large values of "niter" from generateGED or fold, this function might take quite sometime. Of course, if it takes time to train a single function, what more of training 10 functions at once?

Value

An Lx10 matrix of misclassification error rates. Where L is the number of iterations (niter) when the data is simulated data from generateGED or the number of folds (fold=L) when the data is real-life data and 10 are the number of classification functions.

Author(s)

Victor Lih Jong

References

Jong VL, Novianti PW, Roes KCB & Eijkemans MJC. Selecting a classification function for class prediction with gene expression data. Bioinformatics (2016) 32(12): 1814-1822

See Also

covMat, generateGED and plotDirectClass

Examples

1
2
3
4
#Let us use simulated data build the 10 classification functions
myCov<-covMat(pAll=100, lambda=2, corrDE=0.75, sigma=0.25);
myData<-generateGED(covAll=myCov, nTrain=30, nTest=10);
myClassResultsSimulatedData<-directClass(data=myData); #Takes roughly 60 Sec

SPreFuGED documentation built on May 2, 2019, 9:40 a.m.