View source: R/prob-classification_cost.R
| classification_cost | R Documentation |
classification_cost() calculates the cost of a poor prediction based on
user-defined costs. The costs are multiplied by the estimated class
probabilities and the mean cost is returned.
classification_cost(data, ...)
## S3 method for class 'data.frame'
classification_cost(
data,
truth,
...,
costs = NULL,
na_rm = TRUE,
event_level = yardstick_event_level(),
case_weights = NULL
)
classification_cost_vec(
truth,
estimate,
costs = NULL,
na_rm = TRUE,
event_level = yardstick_event_level(),
case_weights = NULL,
...
)
data |
A |
... |
A set of unquoted column names or one or more
|
truth |
The column identifier for the true class results
(that is a |
costs |
A data frame with columns
It is often the case that when If any combinations of the levels of If |
na_rm |
A |
event_level |
A single string. Either |
case_weights |
The optional column identifier for case weights.
This should be an unquoted column name that evaluates to a numeric column
in |
estimate |
If |
As an example, suppose that there are three classes: "A", "B", and "C".
Suppose there is a truly "A" observation with class probabilities A = 0.3 / B = 0.3 / C = 0.4. Suppose that, when the true result is class "A", the
costs for each class were A = 0 / B = 5 / C = 10, penalizing the
probability of incorrectly predicting "C" more than predicting "B". The
cost for this prediction would be 0.3 * 0 + 0.3 * 5 + 0.4 * 10. This
calculation is done for each sample and the individual costs are averaged.
A tibble with columns .metric, .estimator,
and .estimate and 1 row of values.
For grouped data frames, the number of rows returned will be the same as the number of groups.
For class_cost_vec(), a single numeric value (or NA).
Max Kuhn
Other class probability metrics:
average_precision(),
brier_class(),
gain_capture(),
mn_log_loss(),
pr_auc(),
roc_auc(),
roc_aunp(),
roc_aunu()
library(dplyr)
# ---------------------------------------------------------------------------
# Two class example
data(two_class_example)
# Assuming `Class1` is our "event", this penalizes false positives heavily
costs1 <- tribble(
~truth, ~estimate, ~cost,
"Class1", "Class2", 1,
"Class2", "Class1", 2
)
# Assuming `Class1` is our "event", this penalizes false negatives heavily
costs2 <- tribble(
~truth, ~estimate, ~cost,
"Class1", "Class2", 2,
"Class2", "Class1", 1
)
classification_cost(two_class_example, truth, Class1, costs = costs1)
classification_cost(two_class_example, truth, Class1, costs = costs2)
# ---------------------------------------------------------------------------
# Multiclass
data(hpc_cv)
# Define cost matrix from Kuhn and Johnson (2013)
hpc_costs <- tribble(
~estimate, ~truth, ~cost,
"VF", "VF", 0,
"VF", "F", 1,
"VF", "M", 5,
"VF", "L", 10,
"F", "VF", 1,
"F", "F", 0,
"F", "M", 5,
"F", "L", 5,
"M", "VF", 1,
"M", "F", 1,
"M", "M", 0,
"M", "L", 1,
"L", "VF", 1,
"L", "F", 1,
"L", "M", 1,
"L", "L", 0
)
# You can use the col1:colN tidyselect syntax
hpc_cv %>%
filter(Resample == "Fold01") %>%
classification_cost(obs, VF:L, costs = hpc_costs)
# Groups are respected
hpc_cv %>%
group_by(Resample) %>%
classification_cost(obs, VF:L, costs = hpc_costs)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.