mlr_measures_classif.mcc: Matthews Correlation Coefficient

mlr_measures_classif.mccR Documentation

Matthews Correlation Coefficient

Description

Measure to compare true observed labels with predicted labels in multiclass classification tasks.

Details

In the binary case, the Matthews Correlation Coefficient is defined as

\frac{\mathrm{TP} \cdot \mathrm{TN} - \mathrm{FP} \cdot \mathrm{FN}}{\sqrt{(\mathrm{TP} + \mathrm{FP}) (\mathrm{TP} + \mathrm{FN}) (\mathrm{TN} + \mathrm{FP}) (\mathrm{TN} + \mathrm{FN})}},

where TP, FP, TN, TP are the number of true positives, false positives, true negatives, and false negatives respectively.

In the multi-class case, the Matthews Correlation Coefficient is defined for a multi-class confusion matrix C with K classes:

\frac{c \cdot s - \sum_k^K p_k \cdot t_k}{\sqrt{(s^2 - \sum_k^K p_k^2) \cdot (s^2 - \sum_k^K t_k^2)}},

where

  • s = \sum_i^K \sum_j^K C_{ij}: total number of samples

  • c = \sum_k^K C_{kk}: total number of correctly predicted samples

  • t_k = \sum_i^K C_{ik}: number of predictions for each class k

  • p_k = \sum_j^K C_{kj}: number of true occurrences for each class k.

The above formula is undefined if any of the four sums in the denominator is 0 in the binary case and more generally if either s^2 - \sum_k^K p_k^2 or s^2 - \sum_k^K t_k^2) is equal to 0. The denominator is then set to 1.

When there are more than two classes, the MCC will no longer range between -1 and +1. Instead, the minimum value will be between -1 and 0 depending on the true distribution. The maximum value is always +1.

Dictionary

This Measure can be instantiated via the dictionary mlr_measures or with the associated sugar function msr():

mlr_measures$get("classif.mcc")
msr("classif.mcc")

Parameters

Empty ParamSet

Meta Information

  • Type: "classif"

  • Range: [-1, 1]

  • Minimize: FALSE

  • Required prediction: response

Note

The score function calls mlr3measures::mcc() from package mlr3measures.

If the measure is undefined for the input, NaN is returned. This can be customized by setting the field na_value.

See Also

Dictionary of Measures: mlr_measures

as.data.table(mlr_measures) for a complete table of all (also dynamically created) Measure implementations.

Other classification measures: mlr_measures_classif.acc, mlr_measures_classif.auc, mlr_measures_classif.bacc, mlr_measures_classif.bbrier, mlr_measures_classif.ce, mlr_measures_classif.costs, mlr_measures_classif.dor, mlr_measures_classif.fbeta, mlr_measures_classif.fdr, mlr_measures_classif.fn, mlr_measures_classif.fnr, mlr_measures_classif.fomr, mlr_measures_classif.fp, mlr_measures_classif.fpr, mlr_measures_classif.logloss, mlr_measures_classif.mauc_au1p, mlr_measures_classif.mauc_au1u, mlr_measures_classif.mauc_aunp, mlr_measures_classif.mauc_aunu, mlr_measures_classif.mauc_mu, mlr_measures_classif.mbrier, mlr_measures_classif.npv, mlr_measures_classif.ppv, mlr_measures_classif.prauc, mlr_measures_classif.precision, mlr_measures_classif.recall, mlr_measures_classif.sensitivity, mlr_measures_classif.specificity, mlr_measures_classif.tn, mlr_measures_classif.tnr, mlr_measures_classif.tp, mlr_measures_classif.tpr

Other multiclass classification measures: mlr_measures_classif.acc, mlr_measures_classif.bacc, mlr_measures_classif.ce, mlr_measures_classif.costs, mlr_measures_classif.logloss, mlr_measures_classif.mauc_au1p, mlr_measures_classif.mauc_au1u, mlr_measures_classif.mauc_aunp, mlr_measures_classif.mauc_aunu, mlr_measures_classif.mauc_mu, mlr_measures_classif.mbrier


mlr3 documentation built on Oct. 18, 2024, 5:11 p.m.