Description Usage Arguments Details Value Multiclass Relevant Level Author(s) Examples
calibration_curve()
computes the true and predicted probabilities for a
calibration curve.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | calibration_curve(data, ...)
## S3 method for class 'data.frame'
calibration_curve(
data,
truth,
...,
n_bins = 10L,
scale_estimate = FALSE,
discretise_strategy = c("uniform", "quantile"),
na_rm = TRUE
)
calibration_curve_vec(
truth,
estimate,
n_bins = 10L,
scale_estimate = FALSE,
discretise_strategy = c("uniform", "quantile"),
na_rm = TRUE,
...
)
## S3 method for class 'clbr_df'
autoplot(object, ...)
|
data |
A |
... |
Not currently used |
truth |
The column identifier for the true class results (that is a
|
n_bins |
Number of bins to discretize the |
scale_estimate |
A |
discretise_strategy |
Strategy used to define the widths of the bins which is either 'uniform' (default) or 'quantile'. If 'uniform', the bins have idential widths. If 'quantile', the bins have the same number of samples. |
na_rm |
A |
estimate |
The column identifier for the predicted results (that is also
|
object |
The |
The function takes on inputs coming from a binary classifier.
Calibration curve is also known as reliability diagram. This function is named as so to be akin to scikit-learn's calibration_curve method.
Quotes from Niculescu-Mizil & Caruana (2005) with minor modifications: First, the predicted values (probabilities) is discretized into ten bins (default, can be changed). Cases with predicted values between 0 and 0.1 fall in the first bin, between 0.1 and 0.2 in the second bin, etc. For each bin, the mean predicted value is plotted against the true fraction of positive cases.
There is a ggplot2::autoplot()
method for quickly visualising the curve.
This works for binary and multiclass output, and also works with grouped data
(i.e. from resamples).
A tibble with clbr_df
or clbr_grouped_df
having columns .frac_positive
and .mean_predicted
If a multiclass truth
column is provided, a one-vs-all
approach will be taken to calculate multiple curves, one per level.
In this case, there will be an additional column, .level
,
identifying the "one" column in the one-vs-all calculation.
There is no common convention on which factor level should
automatically be considered the "event" or "positive" result.
In yardstick
, the default is to use the first level. To
change this, a global option called yardstick.event_first
is
set to TRUE
when the package is loaded. This can be changed
to FALSE
if the last level of the factor is considered the
level of interest by running: options(yardstick.event_first = FALSE)
.
For multiclass extensions involving one-vs-all
comparisons (such as macro averaging), this option is ignored and
the "one" level is always the relevant result.
An Chu
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | ## Not run:
library(dplyr)
library(ggplot2)
data("two_class_example", package = "yardstick")
two_class_example %>%
calibration_curve(truth, Class1)
two_class_example %>%
calibration_curve(truth, Class1) %>%
autoplot()
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.