View source: R/pinterval_bootstrap.R
| pinterval_bootstrap | R Documentation |
This function computes bootstrapped prediction intervals with a confidence level of 1-alpha for a vector of (continuous) predicted values using bootstrapped prediction errors. The prediction errors to bootstrap from are computed using either a calibration set with predicted and true values or a set of pre-computed prediction errors from a calibration dataset or other data which the model was not trained on (e.g. OOB errors from a model using bagging). The function returns a tibble containing the predicted values along with the lower and upper bounds of the prediction intervals.
pinterval_bootstrap(
pred,
calib,
calib_truth = NULL,
error_type = c("raw", "absolute"),
alpha = 0.1,
n_bootstraps = 1000,
distance_weighted_bootstrap = FALSE,
distance_features_calib = NULL,
distance_features_pred = NULL,
distance_type = c("mahalanobis", "euclidean"),
normalize_distance = "none",
weight_function = c("gaussian_kernel", "caucy_kernel", "logistic", "reciprocal_linear")
)
pred |
Vector of predicted values |
calib |
A numeric vector of predicted values in the calibration partition, or a 2 column tibble or matrix with the first column being the predicted values and the second column being the truth values. If calib is a numeric vector, calib_truth must be provided. |
calib_truth |
A numeric vector of true values in the calibration partition. Only required if calib is a numeric vector |
error_type |
The type of error to use for the prediction intervals. Can be 'raw' or 'absolute'. If 'raw', bootstrapping will be done on the raw prediction errors. If 'absolute', bootstrapping will be done on the absolute prediction errors with random signs. Default is 'raw' |
alpha |
The confidence level for the prediction intervals. Must be a single numeric value between 0 and 1 |
n_bootstraps |
The number of bootstraps to perform. Default is 1000 |
distance_weighted_bootstrap |
Logical. If TRUE, the function will use distance-weighted bootstrapping. Default is FALSE. If TRUE, the probability of selecting a prediction error is weighted by the distance to the predicted value using the specified distance function and weight function. If FALSE, standard bootstrapping is performed. |
distance_features_calib |
A matrix, data frame, or numeric vector of features from which to compute distances when |
distance_features_pred |
A matrix, data frame, or numeric vector of feature values for the prediction set. Must be the same features as specified in |
distance_type |
The type of distance metric to use when computing distances between calibration and prediction points. Options are 'mahalanobis' (default) and 'euclidean'. |
normalize_distance |
Either 'minmax', 'sd', or 'none'. Indicates if and how to normalize the distances when distance_weighted_cp is TRUE. Normalization helps ensure that distances are on a comparable scale across features. Default is 'none'. |
weight_function |
A character string specifying the weighting kernel to use for distance-weighted conformal prediction. Options are:
The default is |
This function estimates prediction intervals using bootstrapped prediction errors derived from a calibration set. It supports both standard and distance-weighted bootstrapping. The calibration set must consist of predicted values and corresponding true values, either provided as separate vectors or as a two-column tibble or matrix. Alternatively, users may provide a vector of precomputed prediction errors if model predictions and truths are already processed.
Two types of error can be used for bootstrapping: - '"raw"': bootstrapping is performed on the raw signed prediction errors (truth - prediction), allowing for asymmetric prediction intervals. - '"absolute"': bootstrapping is done on the absolute errors, and random signs are applied when constructing intervals. This results in (approximately) symmetric intervals around the prediction.
Distance-weighted bootstrapping ('distance_weighted_bootstrap = TRUE') can be used to give more weight to calibration errors closer to each test prediction. Distances are computed between the feature matrices or vectors supplied via 'distance_features_calib' and 'distance_features_pred'. These distances are then transformed into weights using the selected kernel in 'weight_function', with rapidly decaying kernels (e.g., Gaussian) emphasizing strong locality and slower decays (e.g., reciprocal or Cauchy) providing smoother influence. Distances can be geographic coordinates, predicted values, or any other relevant features that capture similarity in the context of the prediction task. The distance metric is specified via 'distance_type', with options for Mahalanobis or Euclidean distance. The default is Mahalanobis distance, which accounts for correlations between features. Normalization of distances can be applied using the 'normalize_distance' parameter. Normalization is primarily useful for euclidean distances to ensure that features on different scales do not disproportionately influence the distance calculations.
The number of bootstrap samples is controlled via the 'n_bootstraps' parameter. For computational efficiency, this can be reduced at the cost of interval precision.
A tibble with the predicted values, lower bounds, and upper bounds of the prediction intervals
library(dplyr)
library(tibble)
# Simulate some data
set.seed(42)
x1 <- runif(1000)
x2 <- runif(1000)
y <- rlnorm(1000, meanlog = x1 + x2, sdlog = 0.4)
df <- tibble(x1, x2, y)
# Split into train/calibration/test
df_train <- df[1:500, ]
df_cal <- df[501:750, ]
df_test <- df[751:1000, ]
# Fit a log-linear model
model <- lm(log(y) ~ x1 + x2, data = df_train)
# Generate predictions
pred_cal <- exp(predict(model, newdata = df_cal))
pred_test <- exp(predict(model, newdata = df_test))
# Compute bootstrap prediction intervals
intervals <- pinterval_bootstrap(
pred = pred_test,
calib = pred_cal,
calib_truth = df_cal$y,
error_type = "raw",
alpha = 0.1,
n_bootstraps = 1000
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.