select_threshold: Select the best classification threshold

View source: R/select_threshold.R

select_thresholdR Documentation

Select the best classification threshold

Description

This function calculates the best threshold for a set of metrics. The metric names must be present in the 'Metric' column of the data frame. The data frame can be generated with the package function [get_threshold_data()]. The metric value and the selected threshold that maximizes or minizmises the metric is returned.

Usage

select_threshold(
  df,
  metrics = c("mcc_tr", "Gmean", "F1", "Balanced Accuracy", "Precision", "Recall"),
  optimize = c("max", "max", "max", "max", "max", "max")
)

Arguments

truth

Vector of true values

prediction

Vector of predicted values

Details

'TP, FN, FP, TN' - the confusion matrix values; 'P_pred, N_pred' - the number of positive and negative predictions 'Sensitivity, Specificity, Pos Pred Value, Neg Pred Value,' 'Precision, Recall, F1,' 'Prevalence, Detection Rate, Detection Prevalence, Balanced Accuracy,' 'fpr, tpr, tnr, fnr'

And the values that are not dependent on the threshold:

'roc_auc' - the area under the receiver-operator curve; 'P, N, N_samples' - The numbers of positive, negative and total samples (extracted from the truth vector); 'pr_baseline' - The baseline for a precision-recall curve. 'pr_baseline = P / N_samples'

All values are returned in a [tibble::tibble()] with the columns 'Metric' - containing the name of the metric; 'Value' - containing the value of the metric; 'threshold' - containing the threshold, for threshold independet metrics, the metric value is the same for all thresholds

You may plot these metrics along the different thresholds with the package function [model_metrics_curves()].

Value

A tibble containing all metrics with their values for each threshold

Examples

y_true <- sample(c(0,1), replace = TRUE, size = 1000)
y_predicted <- runif(1000)
data <- get_threshold_data(truth = y_true, prediction = y_predicted)
data %>% head()
data %>% colnames()

data_thresholds <- select_threshold(df = data, metrics = c("mcc_tr"), optimize = c("max"))


mai00fti/prettyPROC documentation built on Aug. 16, 2024, 4:48 p.m.