View source: R/model_utility.R
model_utility | R Documentation |
Compute the utility of a model score on a classification data set. For each threshold of interest we compute the utility of the classification rule of taking all items with model score greater than or equal to the threshold. The user specifies the outcome (a binary classification target), a model score (numeric), and the utility values (positive, negative, or zero) of each case: true positives, false positives, true negatives, and false negatives. What is returned is a table of model thresholds and the total value of using this model score plus the given threshold as a classification rule. NA is used to mark a threshold where no rows are selected.
model_utility(
d,
model_name,
outcome_name,
...,
outcome_target = TRUE,
true_positive_value_column_name = "true_positive_value",
false_positive_value_column_name = "false_positive_value",
true_negative_value_column_name = "true_negative_value",
false_negative_value_column_name = "false_negative_value"
)
d |
A data.frame containing all data and outcome values. |
model_name |
Name of the column containing model predictions. |
outcome_name |
Name of the column containing the truth values. |
... |
Not used, forces later argument to be specified by name. |
outcome_target |
truth value considered to be TRUE. |
true_positive_value_column_name |
column name of per-row values of true positive cases. Only used on positive instances. |
false_positive_value_column_name |
column name of per-row values of false positive cases. Only used on negative instances. |
true_negative_value_column_name |
column name of per-row values of true negative cases. Only used on negative instances. |
false_negative_value_column_name |
column name of per-row values of false negative cases. Only used on positive instances. |
A worked example can be found here: https://github.com/WinVector/sigr/blob/main/extras/UtilityExample.md.
data.frame of all threshold values.
d <- data.frame(
predicted_probability = c(0, 0.5, 0.5, 0.5),
made_purchase = c(FALSE, TRUE, FALSE, FALSE),
false_positive_value = -5, # acting on any predicted positive costs $5
true_positive_value = 95, # revenue on a true positive is $100 minus action cost
true_negative_value = 0.001, # true negatives have no value in our application
# but just give ourselves a small reward for being right
false_negative_value = -0.01 # adding a small notional tax for false negatives,
# don't want our competitor getting these accounts.
)
values <- model_utility(d, 'predicted_probability', 'made_purchase')
best_strategy <- values[values$total_value >= max(values$total_value), ][1, ]
t(best_strategy)
# a bigger example
d <- data.frame(
predicted_probability = stats::runif(100),
made_purchase = sample(c(FALSE, TRUE), replace = TRUE, size = 100),
false_positive_value = -5, # acting on any predicted positive costs $5
true_positive_value = 95, # revenue on a true positive is $100 minus action cost
true_negative_value = 0.001, # true negatives have no value in our application
# but just give ourselves a small reward for being right
false_negative_value = -0.01 # adding a small notional tax for false negatives,
# don't want our competitor getting these accounts.
)
values <- model_utility(d, 'predicted_probability', 'made_purchase')
# plot the estimated total utility as a function of threshold
plot(values$threshold, values$total_value)
best_strategy <- values[values$total_value >= max(values$total_value), ][1, ]
t(best_strategy)
# without utilities example
d <- data.frame(
predicted_probability = c(0, 0.5, 0.5, 0.5),
made_purchase = c(FALSE, TRUE, FALSE, FALSE))
model_utility(d, 'predicted_probability', 'made_purchase')
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.