cal_binary_tables: Probability Calibration table

.cal_table_breaksR Documentation

Probability Calibration table

Description

Calibration table functions. They require a data.frame that contains the predictions and probability columns. The output is another tibble with segmented data that compares the accuracy of the probability to the actual outcome.

Usage

.cal_table_breaks(
  .data,
  truth = NULL,
  estimate = NULL,
  .by = NULL,
  num_breaks = 10,
  conf_level = 0.9,
  event_level = c("auto", "first", "second"),
  ...
)

.cal_table_logistic(
  .data,
  truth = NULL,
  estimate = NULL,
  .by = NULL,
  conf_level = 0.9,
  smooth = TRUE,
  event_level = c("auto", "first", "second"),
  ...
)

.cal_table_windowed(
  .data,
  truth = NULL,
  estimate = NULL,
  .by = NULL,
  window_size = 0.1,
  step_size = window_size/2,
  conf_level = 0.9,
  event_level = c("auto", "first", "second"),
  ...
)

Arguments

.data

An ungrouped data frame object containing predictions and probability columns.

truth

The column identifier for the true class results (that is a factor). This should be an unquoted column name.

estimate

A vector of column identifiers, or one of dplyr selector functions to choose which variables contains the class probabilities. It defaults to the prefix used by tidymodels (.pred_). The order of the identifiers will be considered the same as the order of the levels of the truth variable.

.by

The column identifier for the grouping variable. This should be a single unquoted column name that selects a qualitative variable for grouping. Default to NULL. When .by = NULL no grouping will take place.

num_breaks

The number of segments to group the probabilities. It defaults to 10.

conf_level

Confidence level to use in the visualization. It defaults to 0.9.

event_level

single string. Either "first" or "second" to specify which level of truth to consider as the "event". Defaults to "auto", which allows the function decide which one to use based on the type of model (binary, multi-class or linear)

...

Additional arguments passed to the tune_results object.

Details

  • .cal_table_breaks() - Splits the data into bins, based on the number of breaks provided (num_breaks). The bins are even ranges, starting at 0, and ending at 1.

  • .cal_table_logistic() - Fits a logistic spline regression (GAM) against the data. It then creates a table with the predictions based on 100 probabilities starting at 0, and ending at 1.

  • .cal_table_windowed() - Creates a running percentage of the probability that moves across the proportion of events.

Examples

.cal_table_breaks(
  segment_logistic,
  Class,
  .pred_good
)

.cal_table_logistic(
  segment_logistic,
  Class,
  .pred_good
)

.cal_table_windowed(
  segment_logistic,
  Class,
  .pred_good
)

topepo/probably documentation built on Oct. 21, 2024, 3:28 a.m.