cal_estimate_linear: Uses a linear regression model to calibrate numeric...

View source: R/cal-estimate-linear.R

cal_estimate_linearR Documentation

Uses a linear regression model to calibrate numeric predictions

Description

Uses a linear regression model to calibrate numeric predictions

Usage

cal_estimate_linear(
  .data,
  truth = NULL,
  estimate = dplyr::matches("^.pred$"),
  smooth = TRUE,
  parameters = NULL,
  ...,
  .by = NULL
)

## S3 method for class 'data.frame'
cal_estimate_linear(
  .data,
  truth = NULL,
  estimate = dplyr::matches("^.pred$"),
  smooth = TRUE,
  parameters = NULL,
  ...,
  .by = NULL
)

## S3 method for class 'tune_results'
cal_estimate_linear(
  .data,
  truth = NULL,
  estimate = dplyr::matches("^.pred$"),
  smooth = TRUE,
  parameters = NULL,
  ...
)

## S3 method for class 'grouped_df'
cal_estimate_linear(
  .data,
  truth = NULL,
  estimate = NULL,
  smooth = TRUE,
  parameters = NULL,
  ...
)

Arguments

.data

Am ungrouped data.frame object, or tune_results object, that contains a prediction column.

truth

The column identifier for the observed outcome data (that is numeric). This should be an unquoted column name.

estimate

Column identifier for the predicted values

smooth

Applies to the linear models. It switches between a generalized additive model using spline terms when TRUE, and simple linear regression when FALSE.

parameters

(Optional) An optional tibble of tuning parameter values that can be used to filter the predicted values before processing. Applies only to tune_results objects.

...

Additional arguments passed to the models or routines used to calculate the new predictions.

.by

The column identifier for the grouping variable. This should be a single unquoted column name that selects a qualitative variable for grouping. Default to NULL. When .by = NULL no grouping will take place.

Details

This function uses existing modeling functions from other packages to create the calibration:

  • stats::glm() is used when smooth is set to FALSE

  • mgcv::gam() is used when smooth is set to TRUE

These methods estimate the relationship in the unmodified predicted values and then remove that trend when cal_apply() is invoked.

See Also

https://www.tidymodels.org/learn/models/calibration/, cal_validate_linear()

Examples

library(dplyr)
library(ggplot2)

head(boosting_predictions_test)

# ------------------------------------------------------------------------------
# Before calibration

y_rng <- extendrange(boosting_predictions_test$outcome)

boosting_predictions_test %>%
  ggplot(aes(outcome, .pred)) +
  geom_abline(lty = 2) +
  geom_point(alpha = 1 / 2) +
  geom_smooth(se = FALSE, col = "blue", linewidth = 1.2, alpha = 3 / 4) +
  coord_equal(xlim = y_rng, ylim = y_rng) +
  ggtitle("Before calibration")

# ------------------------------------------------------------------------------
# Smoothed trend removal

smoothed_cal <-
  boosting_predictions_oob %>%
  # It will automatically identify the predicted value columns when the
  # standard tidymodels naming conventions are used.
  cal_estimate_linear(outcome)
smoothed_cal

boosting_predictions_test %>%
  cal_apply(smoothed_cal) %>%
  ggplot(aes(outcome, .pred)) +
  geom_abline(lty = 2) +
  geom_point(alpha = 1 / 2) +
  geom_smooth(se = FALSE, col = "blue", linewidth = 1.2, alpha = 3 / 4) +
  coord_equal(xlim = y_rng, ylim = y_rng) +
  ggtitle("After calibration")


topepo/probably documentation built on April 6, 2024, 7:32 p.m.