View source: R/prob-roc_curve.R
roc_curve | R Documentation |
roc_curve()
constructs the full ROC curve and returns a
tibble. See roc_auc()
for the area under the ROC curve.
roc_curve(data, ...)
## S3 method for class 'data.frame'
roc_curve(
data,
truth,
...,
na_rm = TRUE,
event_level = yardstick_event_level(),
case_weights = NULL,
options = list()
)
data |
A |
... |
A set of unquoted column names or one or more
|
truth |
The column identifier for the true class results
(that is a |
na_rm |
A |
event_level |
A single string. Either |
case_weights |
The optional column identifier for case weights.
This should be an unquoted column name that evaluates to a numeric column
in |
options |
No longer supported as of yardstick 1.0.0. If you pass something here it will be ignored with a warning. Previously, these were options passed on to |
roc_curve()
computes the sensitivity at every unique
value of the probability column (in addition to infinity and
minus infinity).
There is a ggplot2::autoplot()
method for quickly visualizing the curve.
This works for binary and multiclass output, and also works with grouped
data (i.e. from resamples). See the examples.
A tibble with class roc_df
or roc_grouped_df
having
columns .threshold
, specificity
, and sensitivity
.
If a multiclass truth
column is provided, a one-vs-all
approach will be taken to calculate multiple curves, one per level.
In this case, there will be an additional column, .level
,
identifying the "one" column in the one-vs-all calculation.
There is no common convention on which factor level should
automatically be considered the "event" or "positive" result
when computing binary classification metrics. In yardstick
, the default
is to use the first level. To alter this, change the argument
event_level
to "second"
to consider the last level of the factor the
level of interest. For multiclass extensions involving one-vs-all
comparisons (such as macro averaging), this option is ignored and
the "one" level is always the relevant result.
Max Kuhn
Compute the area under the ROC curve with roc_auc()
.
Other curve metrics:
gain_curve()
,
lift_curve()
,
pr_curve()
# ---------------------------------------------------------------------------
# Two class example
# `truth` is a 2 level factor. The first level is `"Class1"`, which is the
# "event of interest" by default in yardstick. See the Relevant Level
# section above.
data(two_class_example)
# Binary metrics using class probabilities take a factor `truth` column,
# and a single class probability column containing the probabilities of
# the event of interest. Here, since `"Class1"` is the first level of
# `"truth"`, it is the event of interest and we pass in probabilities for it.
roc_curve(two_class_example, truth, Class1)
# ---------------------------------------------------------------------------
# `autoplot()`
# Visualize the curve using ggplot2 manually
library(ggplot2)
library(dplyr)
roc_curve(two_class_example, truth, Class1) %>%
ggplot(aes(x = 1 - specificity, y = sensitivity)) +
geom_path() +
geom_abline(lty = 3) +
coord_equal() +
theme_bw()
# Or use autoplot
autoplot(roc_curve(two_class_example, truth, Class1))
## Not run:
# Multiclass one-vs-all approach
# One curve per level
hpc_cv %>%
filter(Resample == "Fold01") %>%
roc_curve(obs, VF:L) %>%
autoplot()
# Same as above, but will all of the resamples
hpc_cv %>%
group_by(Resample) %>%
roc_curve(obs, VF:L) %>%
autoplot()
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.