Training of a ROC based classifier

Share:

Description

The function establishes the ROC based classifier, returning the classifier specifications.

Usage

1
tr.rocc(g, out, xgenes = 200)

Arguments

g

the input data in form of a matrix with genes as rows and samples as columns. rownames(g) and colnames (g) must be specified.

out

describes the phenotype of the samples. a factor vector with levels 0 and 1 (in this order) with as many values as there are samples.

xgenes

numeric (vector of length 1), determines the number of features to be selected in feature selection.

Details

For feature selection the function picks the given number of xgenes with highest AUC (AUC below 0.5 are mirrored). Features negatively associated (AUC below 0.5) are multiplied by -1. The selected features are merged by the mean values to form a metagene. Samples are ranked according to the metagene expression. The optimal split of positive (i.e., 1) and negative (i.e., 0) samples is determined as the split yielding the highest accuracy, i.e. correct class assignments in respect to the real class. The split yielding optimal accuracy in the ROC curve is determined using the package ROCR. The metagene threshold is computed as the mean metagene expression value of the two samples that build the boarder of the split. The final classifier specifications consist of a) the selected genes b) positive (AUC above 0.5) or negative (AUC below 0.5) association of these genes to the true class, and c) the metagene threshold. A new sample can be classified using the o.rocc() function.

Value

a list as a trocc object with components

AUCs

a matrix containing the selected features with corresponding AUC (aucv), positiv or negativ association (posneg), and mirrored AUC (allpos).

genes

character vector containing the genes selected in the feature selection.

positiv

character vector containing all positively associated genes (AUC above 0.5) selected in the feature selection.

negativ

character vector containing all negatively associated genes (AUC below 0.5) selected in the feature selection.

metagene.expression

numeric vector containing the metagene values of the training samples.

metagene.expression.ranked

numeric vector containing the samples ranked by metagene expression values.

cutoffvalue

the metagene threshold obtained from the best split of training samples.

method

the classification method used: ROC.based.predictor.

Note

depends on the package ROCR

Author(s)

Martin Lauss

References

Lauss M, Frigyesi A, Ryden T, Hoglund M. Robust assignment of cancer subtypes from expression data using a uni-variate gene expression average as classifier. BMC Cancer 2010 (in print)

See Also

p.rocc,o.rocc

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
### Random Dataset and phenotype
set.seed(100)
## Dataset should be a matrix
g <- matrix(rnorm(1000*25),ncol=25)
rownames(g) <- paste("Gene",1:1000,sep="_")
colnames(g) <- paste("Sample",1:25,sep="_")
## Phenotype should be a factor with levels 0 and 1: 
out <- as.factor(sample(c(0:1),size=25,replace=TRUE))

predictor <- tr.rocc (g,out,xgenes=50)

## find classifier specification:
predictor$positiv
predictor$negativ
predictor$cutoffvalue

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.