metric_auc: Compute the area under the curves (ROC, PRC)

Description Usage Arguments Details Value References Examples

View source: R/helpers.R

Description

Function metric_auc computes the AUROC (Area Under the Receiver Operating Characteristic Curve) and the AUPRC (Area Under the Precision Recall Curve), measures of goodness of a ranking in a binary classification problem. Partial areas are also supported. Important: the higher ranked classes are assumed to ideally target positives (label = 1) whereas lower ranks correspond to negatives (label = 0).

Function metric_fun is a wrapper on metric_auc that returns a function for performance evaluation. This function takes as input actual and predicted values and outputs a performance metric. This is needed for functions such as perf and perf_eval, which iterate over a list of such metric functions and return the performance measured through each of them.

Usage

1
2
3
4
5
6
7
8
9
metric_auc(
    actual,
    predicted,
    curve = "ROC",
    partial = c(0, 1),
    standardized = FALSE
)

metric_fun(...)

Arguments

actual

numeric, binary labels of the negatives (0) and positives (1)

predicted

numeric, prediction used to rank the entities - this will typically be the diffusion scores

curve

character, either "ROC" for computing the AUROC or "PRC" for the AUPRC

partial

vector with two numeric values for computing partial areas. The numeric values are the limits in the x axis of the curve, as implemented in the "xlim" argument in part. Defaults to c(0,1), i.e. the whole area

standardized

logical, should partial areas be standardised to range in [0, 1]? Defaults to FALSE and only affects partial areas.

...

parameters to pass to metric_auc

Details

The AUROC is a scalar value: the probability of a randomly chosen positive having a higher rank than a randomly chosen negative. AUROC is cutoff-free and an informative of the performance of a ranker. Likewise, AUPRC is the area under the Precision-Recall curve and is also a standard metric for binary classification. Both measures can be found in [Saito, 2017].

AUROC and AUPRC have their partial counterparts, in which only the area enclosed up to a certain false positive rate (AUROC) or recall (AUPRC) is accounted for. This can be useful when assessing the goodness of the ranking, focused on the top entities.

The user can, however, define his or her custom performance metric. AUROC and AUPRC are common choices, but other problem-specific metrics might be of interest. For example, number of hits in the top k nodes. Machine learning metrics can be found in packages such as Metrics and MLmetrics from the CRAN repository (http://cran.r-project.org/).

Value

metric_auc returns a numeric value, the area under the specified curve

metric_fun returns a function (performance metric)

References

Saito, T., & Rehmsmeier, M. (2017). Precrec: fast and accurate precision–recall and ROC curve calculations in R. Bioinformatics, 33(1), 145-147.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# generate class and numeric ranking
set.seed(1)
n <- 50
actual <- rep(0:1, each = n/2)
predicted <- ifelse(
    actual == 1, 
    runif(n, min = 0.2, max = 1), 
    runif(n, min = 0, max = 0.8))

# AUROC
metric_auc(actual, predicted, curve = "ROC")

# partial AUC (up until false positive rate of 10%)
metric_auc(
    actual, predicted, curve = "ROC", 
    partial = c(0, 0.1))

# The same are, but standardised in (0, 1)
metric_auc(
    actual, predicted, curve = "ROC", 
    partial = c(0, 0.1), standardized = TRUE)

# AUPRC
metric_auc(actual, predicted, curve = "PRC")

# Generate performance functions for perf and perf_eval
f_roc <- metric_fun(
    curve = "ROC", partial = c(0, 0.5), 
    standardized = TRUE)
f_roc
f_roc(actual = actual, predicted = predicted)

diffuStats documentation built on Feb. 22, 2021, 10 a.m.