calcPerformance: Add Performance Calculations to a ClassifyResult Object or...
In DarioS/ClassifyR: A framework for cross-validated classification problems, with applications to differential variability and differential distribution testing

calcExternalPerformance

R Documentation

Add Performance Calculations to a ClassifyResult Object or Calculate for a Pair of Factor Vectors

Description

If calcExternalPerformance is used, such as when having a vector of known classes and a vector of predicted classes determined outside of the ClassifyR package, a single metric value is calculated. If calcCVperformance is used, annotates the results of calling crossValidate, runTests or runTest with one of the user-specified performance measures.

Usage

## S4 method for signature 'factor,factor'
calcExternalPerformance(
  actualOutcome,
  predictedOutcome,
  performanceTypes = "auto"
)

## S4 method for signature 'Surv,numeric'
calcExternalPerformance(
  actualOutcome,
  predictedOutcome,
  performanceTypes = "auto"
)

## S4 method for signature 'factor,tabular'
calcExternalPerformance(
  actualOutcome,
  predictedOutcome,
  performanceTypes = "auto"
)

## S4 method for signature 'ClassifyResult'
calcCVperformance(
  result,
  performanceTypes = "auto",
  grouping = c("permutation", "fold")
)

performanceTable(
  resultsList,
  performanceTypes = "auto",
  aggregate = c("median", "mean")
)

## S4 method for signature 'MultiAssayExperimentOrList'
easyHard(
  measurements,
  result,
  assay = "clinical",
  useFeatures = NULL,
  performanceType = "auto",
  fitMode = c("single", "full")
)

Arguments

`actualOutcome`	A factor vector or survival information specifying each sample's known outcome.
`predictedOutcome`	A factor vector or survival information of the same length as `actualOutcome` specifying each sample's predicted outcome.
`performanceTypes`	Default: `"auto"` A character vector. If `"auto"`, Balanced Accuracy will be used for a classification task and C-index for a time-to-event task. If using `easyHard`, the default is `"Sample Accuracy"` for a classification task and `"Sample C-index"` for a time-to-event task. Must be one of the following options: `"Error"`: Ordinary error rate. `"Accuracy"`: Ordinary accuracy. `"Balanced Error"`: Balanced error rate. `"Balanced Accuracy"`: Balanced accuracy. `"Sample Error"`: Error rate for each sample in the data set. `"Sample Accuracy"`: Accuracy for each sample in the data set. `"Micro Precision"`: Sum of the number of correct predictions in each class, divided by the sum of number of samples in each class. `"Micro Recall"`: Sum of the number of correct predictions in each class, divided by the sum of number of samples predicted as belonging to each class. `"Micro F1"`: F1 score obtained by calculating the harmonic mean of micro precision and micro recall. `"Macro Precision"`: Sum of the ratios of the number of correct predictions in each class to the number of samples in each class, divided by the number of classes. `"Macro Recall"`: Sum of the ratios of the number of correct predictions in each class to the number of samples predicted to be in each class, divided by the number of classes. `"Macro F1"`: F1 score obtained by calculating the harmonic mean of macro precision and macro recall. `"Matthews Correlation Coefficient"`: Matthews Correlation Coefficient (MCC). A score between -1 and 1 indicating how concordant the predicted classes are to the actual classes. Only defined if there are two classes. `"AUC"`: Area Under the Curve. An area ranging from 0 to 1, under the ROC. `"C-index"`: For survival data, the concordance index, for models which produce risk scores. Ranges from 0 to 1. `"Sample C-index"`: Per-individual C-index.
`result`	An object of class `ClassifyResult`.
`grouping`	Default: `"permutation"`. If the cross-validation was k-fold, then this determines whether the metric will be calculated for samples grouped by permutation or by fold, if the value is `"fold"`. For small sample sizes, `"permutation"` would suit. But, for large sample sizes, `"fold"` would be preferable, as class membership probabilities or risk scores are not directly comparable between folds. This setting makes no difference to error or accuracy metrics, apart from their variability.
`resultsList`	A list of modelling results. Each element must be of type `ClassifyResult`.
`aggregate`	Default: `"median"`. Can also be `"mean"`. If there are multiple values, such as for repeated cross-validation, then they are summarised to a single number using either mean or median.
`measurements`	For `easyHard` only. Either a `DataFrame`, `data.frame`, `matrix`, `MultiAssayExperiment` or a list of the basic tabular objects containing the data.
`assay`	For `easyHard` only. The assay to use to look for associations to the per-sample metric.
`performanceType`	For `easyHard` only. One of the valid values shown for `performanceType` parameter of `calcCVperformance`.
`useFeatures`	For `easyHard` only. Default: `NULL` (i.e. use all provided features). A vector of features to consider of the assay specified. This allows for the avoidance of variables such spike-in RNAs, sample IDs, sample acquisition dates, etc. which are not relevant for outcome prediction.
`fitMode`	For `easyHard` only. Default:`"single"`. Either `"single"` or `"full"`. If `"single"`, an ordinary GLM model is fitted for each covariate separately. If `"full"`, elastic net is used to automatically tune the non-zero model coefficients.

Details

All metrics except Matthews Correlation Coefficient are suitable for evaluating classification scenarios with more than two classes and are reimplementations of those available from Intel DAAL.

crossValidate, runTests or runTest was run in resampling mode, one performance measure is produced for every resampling. Otherwise, if the leave-k-out mode was used, then the predictions are concatenated, and one performance measure is calculated for all classifications.

"Balanced Error" calculates the balanced error rate and is better suited to class-imbalanced data sets than the ordinary error rate specified by "Error". "Sample Error" calculates the error rate of each sample individually. This may help to identify which samples are contributing the most to the overall error rate and check them for confounding factors. Precision, recall and F1 score have micro and macro summary versions. The macro versions are preferable because the metric will not have a good score if there is substantial class imbalance and the classifier predicts all samples as belonging to the majority class.

Value

If calcCVperformance was run, an updated ClassifyResult object, with new metric values in the performance slot. If calcExternalPerformance was run, the performance metric value itself.

For easyHard, a DataFrame of logistic regression model summary.

Author(s)

Dario Strbenac

Examples


  predictTable <- DataFrame(sample = paste("A", 1:10, sep = ''),
                            class = factor(sample(LETTERS[1:2], 50, replace = TRUE)))
  actual <- factor(sample(LETTERS[1:2], 10, replace = TRUE))                             
  result <- ClassifyResult(DataFrame(characteristic = "Data Set", value = "Example"),
                           paste("A", 1:10, sep = ''), paste("Gene", 1:50), list(paste("Gene", 1:50), paste("Gene", 1:50)), list(paste("Gene", 1:5), paste("Gene", 1:10)),
                           list(function(oracle){}), NULL, predictTable, actual)
  result <- calcCVperformance(result) 
  performance(result)

DarioS/ClassifyR documentation built on April 14, 2025, 8:36 a.m.

DarioS/ClassifyR index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

DarioS/ClassifyR
A framework for cross-validated classification problems, with applications to differential variability and differential distribution testing

calcPerformance: Add Performance Calculations to a ClassifyResult Object or...
In DarioS/ClassifyR: A framework for cross-validated classification problems, with applications to differential variability and differential distribution testing

Add Performance Calculations to a ClassifyResult Object or Calculate for a Pair of Factor Vectors

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Related to calcPerformance in DarioS/ClassifyR...

R Package Documentation

Browse R Packages

We want your feedback!

DarioS/ClassifyR A framework for cross-validated classification problems, with applications to differential variability and differential distribution testing

calcPerformance: Add Performance Calculations to a ClassifyResult Object or... In DarioS/ClassifyR: A framework for cross-validated classification problems, with applications to differential variability and differential distribution testing

Add Performance Calculations to a ClassifyResult Object or Calculate for a Pair of Factor Vectors

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Related to calcPerformance in DarioS/ClassifyR...

R Package Documentation

Browse R Packages

We want your feedback!

DarioS/ClassifyR
A framework for cross-validated classification problems, with applications to differential variability and differential distribution testing

calcPerformance: Add Performance Calculations to a ClassifyResult Object or...
In DarioS/ClassifyR: A framework for cross-validated classification problems, with applications to differential variability and differential distribution testing