roc | R Documentation |
This is the main function of the pROC package. It builds a ROC
curve and returns a “roc” object, a list of class
“roc”. This object can be print
ed, plot
ted, or
passed to the functions auc
, ci
,
smooth.roc
and coords
. Additionally, two
roc
objects can be compared with roc.test
.
roc(...)
## S3 method for class 'formula'
roc(formula, data, ...)
## S3 method for class 'data.frame'
roc(data, response, predictor,
ret = c("roc", "coords", "all_coords"), ...)
## Default S3 method:
roc(response, predictor, controls, cases,
density.controls, density.cases,
levels=base::levels(as.factor(response)), percent=FALSE, na.rm=TRUE,
direction=c("auto", "<", ">"), algorithm = 6, quiet = FALSE,
smooth=FALSE, auc=TRUE, ci=FALSE, plot=FALSE, smooth.method="binormal",
smooth.n=512, ci.method=NULL, density=NULL, ...)
roc_(data, response, predictor, ret = c("roc", "coords", "all_coords"), ...)
response |
a factor, numeric or character vector of
responses (true class), typically encoded with 0 (controls) and 1 (cases).
Only two classes can be used in a ROC curve. If the vector
contains more than two unique values, or if their order could be
ambiguous, use |
predictor |
a |
controls, cases |
instead of |
density.controls, density.cases |
a smoothed ROC curve can be
built directly from two densities on identical |
formula, data |
a formula of the type |
levels |
the value of the response for controls and cases
respectively. By default, the first two values of
|
percent |
if the sensitivities, specificities and AUC must be
given in percent ( |
na.rm |
if |
direction |
in which direction to make the comparison? “auto” (default): automatically define in which group the median is higher and take the direction accordingly. “>”: if the predictor values for the control group are higher than the values of the case group (controls > t >= cases). “<”: if the predictor values for the control group are lower or equal than the values of the case group (controls < t <= cases). You should set this explicity to “>” or “<” whenever you are resampling or randomizing the data, otherwise the curves will be biased towards higher AUC values. |
algorithm |
the method used to compute sensitivity and specificity,
an integer of length 1 between |
ret |
for |
quiet |
set to |
smooth |
if TRUE, the ROC curve is passed to |
auc |
compute the area under the curve (AUC)? If |
ci |
compute the confidence interval (CI)? If set to |
plot |
plot the ROC curve? If |
smooth.method, smooth.n, ci.method |
in |
density |
|
... |
further arguments passed to or from other methods, and especially:
|
This function's main job is to build a ROC object. See the
“Value” section to this page for more details. Before
returning, it will call (in this order) the smooth
,
auc
, ci
and plot.roc
functions if smooth
auc
, ci
and plot.roc
(respectively) arguments are set to TRUE. By default, only auc
is called.
Data can be provided as response, predictor
, where the
predictor is the numeric (or ordered) level of the evaluated signal, and
the response encodes the observation class (control or case). The
level
argument specifies which response level must be taken as
controls (first value of level
) or cases (second). It can
safely be ignored when the response is encoded as 0
and
1
, but it will frequently fail otherwise. By default, the first
two values of levels(as.factor(response))
are taken, and the
remaining levels are ignored. This means that if your response is
coded “control” and “case”, the levels will be
inverted.
In some cases, it is more convenient to pass the data as
controls, cases
, but both arguments are ignored if
response, predictor
was specified to non-NULL
values.
It is also possible to pass density data with density.controls,
density.cases
, which will result in a smoothed ROC curve even if
smooth=FALSE
, but are ignored if response, predictor
or
controls, cases
are provided.
Specifications for auc
, ci
and
plot.roc
are not kept if auc
, ci
or plot
are set to
FALSE
. Especially, in the following case:
myRoc <- roc(..., auc.polygon=TRUE, grid=TRUE, plot=FALSE) plot(myRoc)
the plot will not have the AUC polygon nor the grid. Similarly, when comparing “roc” objects, the following is not possible:
roc1 <- roc(..., partial.auc=c(1, 0.8), auc=FALSE) roc2 <- roc(..., partial.auc=c(1, 0.8), auc=FALSE) roc.test(roc1, roc2)
This will produce a test on the full AUC, not the partial AUC. To make
a comparison on the partial AUC, you must repeat the specifications
when calling roc.test
:
roc.test(roc1, roc2, partial.auc=c(1, 0.8))
Note that if roc
was called with auc=TRUE
, the latter syntax will not
allow redefining the AUC specifications. You must use reuse.auc=FALSE
for that.
If the data contained any NA
value and na.rm=FALSE
, NA
is
returned. Otherwise, if smooth=FALSE
, a list of class
“roc” with the following fields:
auc |
if called with |
ci |
if called with |
response |
the response vector. Patients whose response is not
|
predictor |
the predictor vector converted to numeric as used to build the ROC
curve. Patients whose response is not |
original.predictor, original.response |
the response and predictor vectors as passed in argument. |
levels |
the levels of the response as defined in argument. |
controls |
the predictor values for the control observations. |
cases |
the predictor values for the cases. |
percent |
if the sensitivities, specificities and AUC are reported in percent, as defined in argument. |
direction |
the direction of the comparison, as defined in argument. |
fun.sesp |
the function used to compute sensitivities and specificities. Will be re-used in bootstrap operations. |
sensitivities |
the sensitivities defining the ROC curve. |
specificities |
the specificities defining the ROC curve. |
thresholds |
the thresholds at which the sensitivities and specificities were computed. See below for details. |
call |
how the function was called. See |
If smooth=TRUE
a list of class “smooth.roc” as returned
by smooth
, with or without additional elements
auc
and ci
(according to the call).
Thresholds are selected as the means between any two consecutive values observed in the data. This choice is aimed to facilitate their interpretation, as any data point will be unambiguously positive or negative regardless of whether the comparison operator includes equality or not.
In rare cases it might not be possible to represent the
mean between two consecutive values, or one might want to use a custom
threshold. In those cases, the semantic of the comparison
is as follows: with direction = '>'
,
observations are positive when they are smaller than or equal
(<=
) to the threshold.
With direction = '<'
, observations are positive when they
are greater than or equal (>=
) to the threshold.
As a corollary, thresholds do not correspond to actual values in the data.
Since version 1.15.0, the roc
function can be used in pipelines, for instance with dplyr or magrittr. This is still a highly experimental feature and will change significantly in future versions (see issue 54).
The roc.data.frame
method supports both standard and non-standard evaluation (NSE):
library(dplyr) # Standard evaluation: aSAH %>% filter(gender == "Female") %>% roc("outcome", "s100b") # Non-Standard Evaluation: aSAH %>% filter(gender == "Female") %>% roc(outcome, s100b)
For tasks involving programming and variable column names, the roc_
function provides
standard evaluation:
# Standard evaluation: aSAH %>% filter(gender == "Female") %>% roc_("outcome", "s100b")
By default it returns the roc
object, which can then be piped to
the coords
function to extract coordinates that can be used
in further pipelines.
# Returns thresholds, sensitivities and specificities: aSAH %>% roc(outcome, s100b) %>% coords(transpose = FALSE) %>% filter(sensitivity > 0.6, specificity > 0.6) # Returns all existing coordinates, then select precision and recall: aSAH %>% roc(outcome, s100b) %>% coords(ret = "all", transpose = FALSE) %>% select(precision, recall)
If no control or case observation exist for the given levels of response, no ROC curve can be built and an error is triggered with message “No control observation” or “No case observation”.
If the predictor is not a numeric or ordered, as defined by
as.numeric
or as.ordered
, the message
“Predictor must be numeric or ordered” is returned.
The message “No valid data provided” is issued when the data
wasn't properly passed. Remember you need both response
and
predictor
of the same (not null) length, or both controls
and cases
. Combinations such as predictor
and
cases
are not valid and will trigger this error.
Infinite values of the predictor cannot always be thresholded by
infinity and can cause ROC curves to not reach 0 or 100%
specificity or sensitivity. Since version 1.13.0, pROC returns NaN
with a warning message “Infinite value(s) in predictor” if
predictor
contains any infinite values.
Tom Fawcett (2006) “An introduction to ROC analysis”. Pattern Recognition Letters 27, 861–874. DOI: \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.patrec.2005.10.010")}.
Xavier Robin, Natacha Turck, Alexandre Hainard, et al. (2011) “pROC: an open-source package for R and S+ to analyze and compare ROC curves”. BMC Bioinformatics, 7, 77. DOI: \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1186/1471-2105-12-77")}.
auc
, ci
, plot.roc
, print.roc
, roc.test
data(aSAH)
# Basic example
roc(aSAH$outcome, aSAH$s100b,
levels=c("Good", "Poor"))
# As levels aSAH$outcome == c("Good", "Poor"),
# this is equivalent to:
roc(aSAH$outcome, aSAH$s100b)
# In some cases, ignoring levels could lead to unexpected results
# Equivalent syntaxes:
roc(outcome ~ s100b, aSAH)
roc(aSAH$outcome ~ aSAH$s100b)
with(aSAH, roc(outcome, s100b))
with(aSAH, roc(outcome ~ s100b))
# With a formula:
roc(outcome ~ s100b, data=aSAH)
## Not run:
library(dplyr)
aSAH %>%
filter(gender == "Female") %>%
roc(outcome, s100b)
## End(Not run)
# Using subset (only with formula)
roc(outcome ~ s100b, data=aSAH, subset=(gender == "Male"))
roc(outcome ~ s100b, data=aSAH, subset=(gender == "Female"))
# With numeric controls/cases
roc(controls=aSAH$s100b[aSAH$outcome=="Good"], cases=aSAH$s100b[aSAH$outcome=="Poor"])
# With ordered controls/cases
roc(controls=aSAH$wfns[aSAH$outcome=="Good"], cases=aSAH$wfns[aSAH$outcome=="Poor"])
# Inverted the levels: "Poor" are now controls and "Good" cases:
roc(aSAH$outcome, aSAH$s100b,
levels=c("Poor", "Good"))
# The result was exactly the same because of direction="auto".
# The following will give an AUC < 0.5:
roc(aSAH$outcome, aSAH$s100b,
levels=c("Poor", "Good"), direction="<")
# If we are sure about levels and direction auto-detection,
# we can turn off the messages:
roc(aSAH$outcome, aSAH$s100b, quiet = TRUE)
# If we prefer counting in percent:
roc(aSAH$outcome, aSAH$s100b, percent=TRUE)
# Plot and CI (see plot.roc and ci for more options):
roc(aSAH$outcome, aSAH$s100b,
percent=TRUE, plot=TRUE, ci=TRUE)
# Smoothed ROC curve
roc(aSAH$outcome, aSAH$s100b, smooth=TRUE)
# this is not identical to
smooth(roc(aSAH$outcome, aSAH$s100b))
# because in the latter case, the returned object contains no AUC
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.