roc: Compute Receiver Operating Characteristic statistics from a...

View source: R/roc.R

rocR Documentation

Compute Receiver Operating Characteristic statistics from a dataframe or CST object

Description

This function computes the relevant statistics for carrying out ROC error analysis for two-class classifiers that produce a score. The input dataframe should contain at least two columns, named "target" and "score", respectively. The classifier should produce scores that are higher for target==TRUE and lower for target==FALSE. roc() computes the false alarm probability and miss probability as a function of an implicitly swept threshold. It further computes summary statistics, such as the equal error rate (EER), the cost of the log likelihood ratio (if the score is a log-likelihood-ratio), and the convex hull of the ROC, which reveals the optimal score-to-log-likelihood ratio transform. The structure can be used for subsequent analysis and plotting.

Usage

roc(x, laplace=TRUE)
as.roc(y)

Arguments

x

A data frame, optionally of class cst

laplace

A logical: should Laplace's rule of succesion be applied? This option effectively adds scores of plus and minus infinity to both target and non-target scores, smoothing the false alarm and miss probabilities to account for unseen observations.

y

An object of class roc, or a data frame that can be coerced to that by using roc(y)

Details

The data x is a data frame with at least two fields: a logical target, indicating whether the trial is a target or a non-target trial, and a numeric score, a value that increases with increasing likelihood that target==TRUE. An object of class cst (collection of supervised trials) ensures that these fields are present. Other columns in the data frame can be used to specify factors for conditioning, or other meta data. Typically, a cst object is created by as.cst or cst.tnt.

as.roc is used internally in places where an object of type roc is expected, ans is used as a convenience function to functions like det.plot, so that the data can be passed directly as an argument to that function. In other words, det.plot(roc(x)) and det.plot(x) can be used both. Computing the ROC statistics is relatively computationally intensive, so it can be more efficient to use roc explicitly and store the result.

Value

The function returns a data frame containing the important ROC statistics

pfa

The probability of false alarm (fraction of non-target trials with a score below the threshold)

pmiss

The probability of a miss (fraction of target trials with a score above the threshold)

thres

The threshold at which pfa and pmiss are determined

chull

A logical indicating whether this point is on the Convex Hull of the ROC or not

opt.llr

The optimum log-lilelihood-ratio corresponding to scores between this threshold and the next. It is equal to the log of the negative slope of line segments on the convex hull.

The data frame further has two additional attributes, data, the original data augmented with a column opt.llr, and stats, which contains some basic summary statistics of the ROC analysis.

Cllr

Cost of LLR (see [2,3])

Cllr.min

Minimum Cllr, computed using isotonic regression (see [2])

EER

The equal Error Rate, computed using the Convex Hull method

mt

The mean value of target scores

mn

The mean value of non-target scores

nt

The number of target trials

nn

The number of non-target trials

n

The number of trials

discrete

A heuristic indicating if the scores are discrete or continuous.

Author(s)

David A. van Leeuwen

References

  1. Alvin Martin et al, “The DET Curve in Assessment of Detection Task Performance,” Proc. Interspeech, 1895–1898 (1997).

  2. Niko Br\"ummer and Johan du Preez, “Application-independent evaluation of speaker detection,” Computer, Speech and Language 20, 230–275, (2006).

  3. David van Leeuwen and Niko Br\"ummer, “An Introduction to Application-Independent Evaluation of Speaker Recognition System,” LNCS 4343 (2007).

  4. 4 Foster Provost and Tom Fawcett, “Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions,” Third International Conference on Knowledge Discovery and Data Mining (1997).

See Also

read.cst, read.tnt, plor.roc, det.plot

Examples

## Load example SRE data: 
## RU submission to EVALITA speaker recognition applications track
data(ru.2009)
## inspect details of data frame
head(ru.2009)
## look at TC6 train condition and TS2 test condition (easiest task:-)
x <- subset(ru.2009, mcond=="TC6" & tcond=="TS2")
## compute det statistics
r <- roc(x)
r
summary(r)
## and plot results
plot(r, main="RU TC6 TS1 primary submission EVALITA 2009")
det.plot(r, main="RU TC6 TS1 primary submission EVALITA 2009")

davidavdav/ROC documentation built on Sept. 8, 2023, 2:39 p.m.