optimalScore: Identifying optimal prediction score

Description Usage Arguments Value Author(s) Examples

Description

The optimal prediction score is detected with the F-score measure for the machine learning-based classification model which can balance true positives (TPs), false positives (FPs), true negatives (TNs) and false negatives (FNs).

Usage

1
optimalScore( positiveSampleScores, negativeSampleScores, beta = 2, plot = TRUE )

Arguments

positiveSampleScores

a numeric vector, the prediction scores of positive samples.

negativeSampleScores

a numeric vector, the prediction scores of negative samples.

beta

a positive numeric value, beta > 1 indicating that a higher preference is given to recall than precision; beta = 1 denoting the recall and precision are weighted equally. beta < 1 representing that a higher preference is given to precision than recall.

plot

logical, TRUE indicates the distribution of F-score at different threshold of prediction score is plotted. Otherwise not plotted.

Value

A list containing two components:

statMat

a numeric matrix recording the statistic results (i.e., prediction score, TP, FP, TN, FN, Recall, TNR[TN/(TN+FP)], Precision, F-score ) at the threshold of all possible prediction scores.

optimalScore

a numeric value, the identified optimal prediction score.

Author(s)

Chuang Ma, Xiangfeng Wang

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
 ## Not run: 

   ##generate expression feature matrix
   sampleVec1 <- c(1, 2, 3, 4, 5, 6)
   sampleVec2 <- c(1, 2, 3, 4, 5, 6)
   featureMat <- expFeatureMatrix( expMat1 = ControlExpMat, sampleVec1 = sampleVec1, 
                                   expMat2 = SaltExpMat, sampleVec2 = sampleVec2, 
                                   logTransformed = TRUE, base = 2,
                              features = c("zscore", "foldchange", "cv", "expression"))

   ##positive samples
   positiveSamples <- as.character(sampleData$KnownSaltGenes)
   ##unlabeled samples
   unlabelSamples <- setdiff( rownames(featureMat), positiveSamples )
   idx <- sample(length(unlabelSamples))
   ##randomly selecting a set of unlabeled samples as negative samples
   negativeSamples <- unlabelSamples[idx[1:length(positiveSamples)]]

   ##five-fold cross validation
   seed <- randomSeed() #generate a random seed
   cvRes <- cross_validation(seed = seed, method = "randomForest", 
                             featureMat = featureMat, 
                             positives = positiveSamples, negatives = negativeSamples, 
                             cross = 5, cpus = 1,
                             ntree = 100 ) ##parameters for random forest algorithm

    ##prediction scores of positive and negative samples from the 
    ##first round of cross validation
    positiveSampleScores <- cvRes[[1]]$positives.test.score
    negativeSampleScores <- cvRes[[1]]$negatives.test.score
    res <- optimalScore( positiveSampleScores, negativeSampleScores, 
                         beta = 2, plot = TRUE )
    
    #the optimal threshold
    res$optimalScore

    #statistic results for different threshold of prediction scores
    res$statMat[1:10,]

## End(Not run)

mlDNA documentation built on May 2, 2019, 2:15 p.m.