cal_estimate_linear: Uses a linear regression model to calibrate numeric...
In topepo/probably: Tools for Post-Processing Predicted Values

cal_estimate_linear

R Documentation

Uses a linear regression model to calibrate numeric predictions

Description

Uses a linear regression model to calibrate numeric predictions

Usage

cal_estimate_linear(
  .data,
  truth = NULL,
  estimate = dplyr::matches("^.pred$"),
  smooth = TRUE,
  parameters = NULL,
  ...,
  .by = NULL
)

## S3 method for class 'data.frame'
cal_estimate_linear(
  .data,
  truth = NULL,
  estimate = dplyr::matches("^.pred$"),
  smooth = TRUE,
  parameters = NULL,
  ...,
  .by = NULL
)

## S3 method for class 'tune_results'
cal_estimate_linear(
  .data,
  truth = NULL,
  estimate = dplyr::matches("^.pred$"),
  smooth = TRUE,
  parameters = NULL,
  ...
)

## S3 method for class 'grouped_df'
cal_estimate_linear(
  .data,
  truth = NULL,
  estimate = NULL,
  smooth = TRUE,
  parameters = NULL,
  ...
)

Arguments

`.data`	Am ungrouped `data.frame` object, or `tune_results` object, that contains a prediction column.
`truth`	The column identifier for the observed outcome data (that is numeric). This should be an unquoted column name.
`estimate`	Column identifier for the predicted values
`smooth`	Applies to the linear models. It switches between a generalized additive model using spline terms when `TRUE`, and simple linear regression when `FALSE`.
`parameters`	(Optional) An optional tibble of tuning parameter values that can be used to filter the predicted values before processing. Applies only to `tune_results` objects.
`...`	Additional arguments passed to the models or routines used to calculate the new predictions.
`.by`	The column identifier for the grouping variable. This should be a single unquoted column name that selects a qualitative variable for grouping. Default to `NULL`. When `.by = NULL` no grouping will take place.

Details

This function uses existing modeling functions from other packages to create the calibration:

stats::glm() is used when smooth is set to FALSE
mgcv::gam() is used when smooth is set to TRUE

These methods estimate the relationship in the unmodified predicted values and then remove that trend when cal_apply() is invoked.

Examples

library(dplyr)
library(ggplot2)

head(boosting_predictions_test)

# ------------------------------------------------------------------------------
# Before calibration

y_rng <- extendrange(boosting_predictions_test$outcome)

boosting_predictions_test |>
  ggplot(aes(outcome, .pred)) +
  geom_abline(lty = 2) +
  geom_point(alpha = 1 / 2) +
  geom_smooth(se = FALSE, col = "blue", linewidth = 1.2, alpha = 3 / 4) +
  coord_equal(xlim = y_rng, ylim = y_rng) +
  ggtitle("Before calibration")

# ------------------------------------------------------------------------------
# Smoothed trend removal

smoothed_cal <-
  boosting_predictions_oob |>
  # It will automatically identify the predicted value columns when the
  # standard tidymodels naming conventions are used.
  cal_estimate_linear(outcome)
smoothed_cal

boosting_predictions_test |>
  cal_apply(smoothed_cal) |>
  ggplot(aes(outcome, .pred)) +
  geom_abline(lty = 2) +
  geom_point(alpha = 1 / 2) +
  geom_smooth(se = FALSE, col = "blue", linewidth = 1.2, alpha = 3 / 4) +
  coord_equal(xlim = y_rng, ylim = y_rng) +
  ggtitle("After calibration")

topepo/probably documentation built on June 8, 2025, 4:23 a.m.