calcPerformance: Add Performance Calculations to a ClassifyResult Object or...

Description Usage Arguments Details Value Author(s) Examples

Description

If calcExternalPerformance is used, such as when having a vector of known classes and a vector of predicted classes determined outside of the ClassifyR package, a single metric value is calculated. If calcCVperformance is used, annotates the results of calling runTests with one of the user-specified performance measures.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
  ## S4 method for signature 'factor,factor'
calcExternalPerformance(actualClasses, predictedClasses,
                              performanceType = c("error", "accuracy", "balanced error", "balanced accuracy",
                                       "micro precision", "micro recall",
                                       "micro F1", "macro precision",
                                       "macro recall", "macro F1", "matthews"))
  ## S4 method for signature 'ClassifyResult'
calcCVperformance(result, performanceType = c("error", "accuracy", "balanced error", "balanced accuracy",
                                               "sample error", "sample accuracy",
                                              "micro precision", "micro recall",
                                             "micro F1", "macro precision",
                                             "macro recall", "macro F1", "matthews"))

Arguments

result

An object of class ClassifyResult.

performanceType

A character vector of length 1. Default: "balanced error".
Must be one of the following options:

  • "error": Ordinary error rate.

  • "accuracy": Ordinary accuracy.

  • "balanced error": Balanced error rate.

  • "balanced accuracy": Balanced accuracy.

  • "sample error": Error rate for each sample in the data set.

  • "sample accuracy": Accuracy for each sample in the data set.

  • "micro precision": Sum of the number of correct predictions in each class, divided by the sum of number of samples in each class.

  • "micro recall": Sum of the number of correct predictions in each class, divided by the sum of number of samples predicted as belonging to each class.

  • "micro F1": F1 score obtained by calculating the harmonic mean of micro precision and micro recall.

  • "macro precision": Sum of the ratios of the number of correct predictions in each class to the number of samples in each class, divided by the number of classes.

  • "macro recall": Sum of the ratios of the number of correct predictions in each class to the number of samples predicted to be in each class, divided by the number of classes.

  • "macro F1": F1 score obtained by calculating the harmonic mean of macro precision and macro recall.

  • "matthews": Matthews Correlation Coefficient (MCC). A score between -1 and 1 indicating how concordant the predicted classes are to the actual classes. Only defined if there are two classes.

actualClasses

A factor vector specifying each sample's correct class.

predictedClasses

A factor vector of the same length as actualClasses specifying each sample's predicted class.

Details

All metrics except Matthews Correlation Coefficient are suitable for evaluating classification scenarios with more than two classes and are reimplementations of those available from Intel DAAL.

If runTests was run in resampling mode, one performance measure is produced for every resampling. If the leave-k-out mode was used, then the predictions are concatenated, and one performance measure is calculated for all classifications.

"balanced error" calculates the balanced error rate and is better suited to class-imbalanced data sets than the ordinary error rate specified by "error". "sample error" calculates the error rate of each sample individually. This may help to identify which samples are contributing the most to the overall error rate and check them for confounding factors. Precision, recall and F1 score have micro and macro summary versions. The macro versions are preferable because the metric will not have a good score if there is substantial class imbalance and the classifier predicts all samples as belonging to the majority class.

Value

If calcCVperformance was run, an updated ClassifyResult object, with new metric values in the performance slot. If calcExternalPerformance was run, the performance metric value itself.

Author(s)

Dario Strbenac

Examples

1
2
3
4
5
6
7
8
9
  predictTable <- data.frame(sample = paste("A", 1:10, sep = ''),
                             class = factor(sample(LETTERS[1:2], 50, replace = TRUE)))
  actual <- factor(sample(LETTERS[1:2], 10, replace = TRUE))                             
  result <- ClassifyResult("Example", "Differential Expression", "A Selection",
                           paste("A", 1:10, sep = ''), paste("Gene", 1:50, sep = ''),
                           50, list(1:50, 1:50), list(1:5, 6:15), list(function(oracle){}),
                           list(predictTable), actual, list("leave", 2))
  result <- calcCVperformance(result, "balanced error") 
  performance(result)

ClassifyR documentation built on Nov. 8, 2020, 6:53 p.m.