View source: R/pinterval_mondrian.R
| pinterval_mondrian | R Documentation |
This function calculates Mondrian conformal prediction intervals with a confidence level of 1-alpha for a vector of (continuous) predicted values using inductive conformal prediction on a Mondrian class-by-class basis. The intervals are computed using a calibration set with predicted and true values and their associated classes. The function returns a tibble containing the predicted values along with the lower and upper bounds of the prediction intervals. Mondrian conformal prediction intervals are useful when the prediction error is not constant across groups or classes, as they allow for locally valid coverage by ensuring that the coverage level 1 - \alpha holds within each class—assuming exchangeability of non-conformity scores within classes.
pinterval_mondrian(
pred,
pred_class = NULL,
calib = NULL,
calib_truth = NULL,
calib_class = NULL,
alpha = 0.1,
ncs_type = c("absolute_error", "relative_error", "za_relative_error",
"heterogeneous_error", "raw_error"),
lower_bound = NULL,
upper_bound = NULL,
grid_size = 10000,
resolution = NULL,
distance_weighted_cp = FALSE,
distance_features_calib = NULL,
distance_features_pred = NULL,
distance_type = c("mahalanobis", "euclidean"),
normalize_distance = "none",
weight_function = c("gaussian_kernel", "caucy_kernel", "logistic", "reciprocal_linear")
)
pred |
Vector of predicted values |
pred_class |
A vector of class identifiers for the predicted values. This is used to group the predictions by class for Mondrian conformal prediction. |
calib |
A numeric vector of predicted values in the calibration partition, or a 2 column tibble or matrix with the first column being the predicted values and the second column being the truth values. If calib is a numeric vector, calib_truth must be provided. |
calib_truth |
A numeric vector of true values in the calibration partition. Only required if calib is a numeric vector |
calib_class |
A vector of class identifiers for the calibration set. |
alpha |
The confidence level for the prediction intervals. Must be a single numeric value between 0 and 1 |
ncs_type |
A string specifying the type of nonconformity score to use. Available options are:
The default is |
lower_bound |
Optional minimum value for the prediction intervals. If not provided, the minimum (true) value of the calibration partition will be used. Primarily useful when the possible outcome values are outside the range of values observed in the calibration set. If not provided, the minimum (true) value of the calibration partition will be used. |
upper_bound |
Optional maximum value for the prediction intervals. If not provided, the maximum (true) value of the calibration partition will be used. Primarily useful when the possible outcome values are outside the range of values observed in the calibration set. If not provided, the maximum (true) value of the calibration partition will be used. |
grid_size |
The number of points to use in the grid search between the lower and upper bound. Default is 10,000. A larger grid size increases the resolution of the prediction intervals but also increases computation time. |
resolution |
Alternatively to grid_size. The minimum step size between grid points. Useful if the a specific resolution is desired. Default is NULL. |
distance_weighted_cp |
Logical. If |
distance_features_calib |
A matrix, data frame, or numeric vector of features from which to compute distances when |
distance_features_pred |
A matrix, data frame, or numeric vector of feature values for the prediction set. Must be the same features as specified in |
distance_type |
The type of distance metric to use when computing distances between calibration and prediction points. Options are 'mahalanobis' (default) and 'euclidean'. |
normalize_distance |
Either 'minmax', 'sd', or 'none'. Indicates if and how to normalize the distances when distance_weighted_cp is TRUE. Normalization helps ensure that distances are on a comparable scale across features. Default is 'none'. |
weight_function |
A character string specifying the weighting kernel to use for distance-weighted conformal prediction. Options are:
The default is |
'pinterval_mondrian()' extends [pinterval_conformal()] to the Mondrian
setting, where prediction intervals are calibrated separately within
user-defined groups (often called "Mondrian categories"). Instead of
pooling all calibration residuals into a single reference distribution,
the method constructs a separate non-conformity distribution for each
subgroup defined by a grouping variable (e.g., region, regime type, or
income category). This allows the intervals to adapt to systematic
differences in error magnitude or variance across groups and targets
coverage conditional on group membership. It is especially useful when prediction error varies systematically across known categories, allowing for class-conditional validity by ensuring that the prediction intervals attain the desired coverage level 1 - \alpha within each class—under the assumption of exchangeability within classes.
Conceptually, the underlying inductive conformal machinery is the same as in [pinterval_conformal()], but applied within groups rather than globally. For a detailed description of non-conformity scores, distance-weighting, and the general conformal prediction framework, see [pinterval_conformal()].
For 'pinterval_mondrian()', the calibration set must include predicted values, true values, and corresponding class labels. These can be supplied as separate vectors ('calib', 'calib_truth', and 'calib_class') or as a single three-column matrix or tibble.
A tibble with predicted values, lower and upper prediction interval bounds, and class labels.
pinterval_conformal
# Generate synthetic data
library(dplyr)
library(tibble)
set.seed(123)
x1 <- runif(1000)
x2 <- runif(1000)
group <- sample(c("A", "B", "C"), size = 1000, replace = TRUE)
mu <- ifelse(group == "A", 1 + x1 + x2,
ifelse(group == "B", 2 + x1 + x2,
3 + x1 + x2))
y <- rlnorm(1000, meanlog = mu, sdlog = 0.4)
df <- tibble(x1, x2, group, y)
df_train <- df %>% slice(1:500)
df_cal <- df %>% slice(501:750)
df_test <- df %>% slice(751:1000)
# Fit a model to the training data
mod <- lm(log(y) ~ x1 + x2, data = df_train)
# Generate predictions
calib <- exp(predict(mod, newdata = df_cal))
calib_truth <- df_cal$y
calib_class <- df_cal$group
pred_test <- exp(predict(mod, newdata = df_test))
pred_test_class <- df_test$group
# Apply Mondrian conformal prediction
pinterval_mondrian(pred = pred_test,
pred_class = pred_test_class,
calib = calib,
calib_truth = calib_truth,
calib_class = calib_class,
alpha = 0.1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.