roc | R Documentation |
This function computes the relevant statistics for carrying out ROC
error analysis for two-class classifiers that produce a score. The
input dataframe should contain at least two columns, named "target" and
"score", respectively. The classifier should produce scores that
are higher for target==TRUE
and lower for target==FALSE
.
roc()
computes the false alarm probability and miss probability as
a function of an implicitly swept threshold. It further computes
summary statistics, such as the equal error rate (EER), the cost of
the log likelihood ratio (if the score is a log-likelihood-ratio),
and the convex hull of the ROC, which reveals the optimal
score-to-log-likelihood ratio transform. The structure can be used for
subsequent analysis and plotting.
roc(x, laplace=TRUE)
as.roc(y)
x |
A data frame, optionally of class |
laplace |
A logical: should Laplace's rule of succesion be applied? This option effectively adds scores of plus and minus infinity to both target and non-target scores, smoothing the false alarm and miss probabilities to account for unseen observations. |
y |
An object of class |
The data x
is a data frame with at least two fields: a logical target
, indicating
whether the trial is a target or a non-target trial, and a numeric score
, a value that increases with increasing likelihood that target==TRUE
. An object of class cst
(collection of supervised trials) ensures that these fields are present.
Other columns in the
data frame can be used to specify factors for conditioning, or other
meta data. Typically, a cst
object is created by as.cst
or
cst.tnt
.
as.roc
is used internally in places where an object of type roc
is expected, ans is used as a convenience function to functions like det.plot
, so that the data can be passed directly as an argument to that function. In other words, det.plot(roc(x))
and det.plot(x)
can be used both. Computing the ROC statistics is relatively computationally intensive, so it can be more efficient to use roc
explicitly and store the result.
The function returns a data frame containing the important ROC statistics
pfa |
The probability of false alarm (fraction of non-target trials with a score below the threshold) |
pmiss |
The probability of a miss (fraction of target trials with a score above the threshold) |
thres |
The threshold at which |
chull |
A logical indicating whether this point is on the Convex Hull of the ROC or not |
opt.llr |
The optimum log-lilelihood-ratio corresponding to scores between this threshold and the next. It is equal to the log of the negative slope of line segments on the convex hull. |
The data frame further has two additional attributes, data
, the original data augmented with a column opt.llr
, and stats
, which contains some basic summary statistics of the ROC analysis.
Cllr |
Cost of LLR (see [2,3]) |
Cllr.min |
Minimum Cllr, computed using isotonic regression (see [2]) |
EER |
The equal Error Rate, computed using the Convex Hull method |
mt |
The mean value of target scores |
mn |
The mean value of non-target scores |
nt |
The number of target trials |
nn |
The number of non-target trials |
n |
The number of trials |
discrete |
A heuristic indicating if the scores are discrete or continuous. |
David A. van Leeuwen
Alvin Martin et al, “The DET Curve in Assessment of Detection Task Performance,” Proc. Interspeech, 1895–1898 (1997).
Niko Br\"ummer and Johan du Preez, “Application-independent evaluation of speaker detection,” Computer, Speech and Language 20, 230–275, (2006).
David van Leeuwen and Niko Br\"ummer, “An Introduction to Application-Independent Evaluation of Speaker Recognition System,” LNCS 4343 (2007).
4 Foster Provost and Tom Fawcett, “Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions,” Third International Conference on Knowledge Discovery and Data Mining (1997).
read.cst
, read.tnt
, plor.roc
, det.plot
## Load example SRE data:
## RU submission to EVALITA speaker recognition applications track
data(ru.2009)
## inspect details of data frame
head(ru.2009)
## look at TC6 train condition and TS2 test condition (easiest task:-)
x <- subset(ru.2009, mcond=="TC6" & tcond=="TS2")
## compute det statistics
r <- roc(x)
r
summary(r)
## and plot results
plot(r, main="RU TC6 TS1 primary submission EVALITA 2009")
det.plot(r, main="RU TC6 TS1 primary submission EVALITA 2009")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.