View source: R/prob-classification_cost.R
classification_cost | R Documentation |
classification_cost()
calculates the cost of a poor prediction based on
user-defined costs. The costs are multiplied by the estimated class
probabilities and the mean cost is returned.
classification_cost(data, ...)
## S3 method for class 'data.frame'
classification_cost(
data,
truth,
...,
costs = NULL,
na_rm = TRUE,
event_level = yardstick_event_level(),
case_weights = NULL
)
classification_cost_vec(
truth,
estimate,
costs = NULL,
na_rm = TRUE,
event_level = yardstick_event_level(),
case_weights = NULL,
...
)
data |
A |
... |
A set of unquoted column names or one or more
|
truth |
The column identifier for the true class results
(that is a |
costs |
A data frame with columns
It is often the case that when If any combinations of the levels of If |
na_rm |
A |
event_level |
A single string. Either |
case_weights |
The optional column identifier for case weights.
This should be an unquoted column name that evaluates to a numeric column
in |
estimate |
If |
As an example, suppose that there are three classes: "A"
, "B"
, and "C"
.
Suppose there is a truly "A"
observation with class probabilities A = 0.3 / B = 0.3 / C = 0.4
. Suppose that, when the true result is class "A"
, the
costs for each class were A = 0 / B = 5 / C = 10
, penalizing the
probability of incorrectly predicting "C"
more than predicting "B"
. The
cost for this prediction would be 0.3 * 0 + 0.3 * 5 + 0.4 * 10
. This
calculation is done for each sample and the individual costs are averaged.
A tibble
with columns .metric
, .estimator
,
and .estimate
and 1 row of values.
For grouped data frames, the number of rows returned will be the same as the number of groups.
For class_cost_vec()
, a single numeric
value (or NA
).
Max Kuhn
Other class probability metrics:
average_precision()
,
brier_class()
,
gain_capture()
,
mn_log_loss()
,
pr_auc()
,
roc_auc()
,
roc_aunp()
,
roc_aunu()
library(dplyr)
# ---------------------------------------------------------------------------
# Two class example
data(two_class_example)
# Assuming `Class1` is our "event", this penalizes false positives heavily
costs1 <- tribble(
~truth, ~estimate, ~cost,
"Class1", "Class2", 1,
"Class2", "Class1", 2
)
# Assuming `Class1` is our "event", this penalizes false negatives heavily
costs2 <- tribble(
~truth, ~estimate, ~cost,
"Class1", "Class2", 2,
"Class2", "Class1", 1
)
classification_cost(two_class_example, truth, Class1, costs = costs1)
classification_cost(two_class_example, truth, Class1, costs = costs2)
# ---------------------------------------------------------------------------
# Multiclass
data(hpc_cv)
# Define cost matrix from Kuhn and Johnson (2013)
hpc_costs <- tribble(
~estimate, ~truth, ~cost,
"VF", "VF", 0,
"VF", "F", 1,
"VF", "M", 5,
"VF", "L", 10,
"F", "VF", 1,
"F", "F", 0,
"F", "M", 5,
"F", "L", 5,
"M", "VF", 1,
"M", "F", 1,
"M", "M", 0,
"M", "L", 1,
"L", "VF", 1,
"L", "F", 1,
"L", "M", 1,
"L", "L", 0
)
# You can use the col1:colN tidyselect syntax
hpc_cv %>%
filter(Resample == "Fold01") %>%
classification_cost(obs, VF:L, costs = hpc_costs)
# Groups are respected
hpc_cv %>%
group_by(Resample) %>%
classification_cost(obs, VF:L, costs = hpc_costs)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.