| CalibrationCurves | R Documentation |
The CalibrationCurves package provides tools to assess and visualize the calibration performance of prediction models. Calibration refers to the agreement between predicted probabilities or values and what is actually observed.
The package covers a broad range of outcome types and modelling settings:
Binary outcomes — val.prob.ci.2 (base R graphics) and
valProbggplot (ggplot2) compute flexible calibration curves (loess or
restricted cubic splines) with pointwise 95% confidence intervals, logistic calibration
slope and intercept, c-statistic, Brier score, and other statistics.
Clustered binary outcomes — valProbCluster assesses
calibration while accounting for clustering via three approaches: Clustered Grouped
Calibration (CGC), Meta-Analytical Calibration Curve (MAC2),
and Mixed-Effects Model Calibration (MIXC). See Barreñada et al. (2025).
Generalized outcomes (exponential family) — genCalCurve
extends the calibration framework to outcomes whose distribution belongs to the
exponential family (e.g., Poisson, Gamma). It estimates the generalized calibration
slope and intercept and plots the generalized calibration curve. See De Cock Campo (2023).
Survival outcomes — valProbSurvival evaluates calibration
for a fitted Cox proportional hazards model at a given time horizon, producing
calibration curves and summary statistics for time-to-event predictions.
A vignette is available that provides a comprehensive overview of the theory and illustrates the functions with worked examples. Further background is available in the linked papers below.
History
Some years ago, Yvonne Vergouwe and Ewout Steyerberg adapted the function val.prob from the rms-package (https://cran.r-project.org/package=rms) into val.prob.ci and added the following features:
Scaled Brier score by relating to max for average calibrated Null model
Risk distribution according to outcome
0 and 1 to indicate outcome label; set with d1lab="..", d0lab=".."
Labels: y axis: "Observed Frequency"; Triangle: "Grouped observations"
Confidence intervals around triangles
A cut-off can be plotted; set x coordinate
In December 2015, Bavo De Cock, Daan Nieboer, and Ben Van Calster adapted
this to val.prob.ci.2:
Flexible calibration curves using loess (default) or restricted cubic splines, with pointwise 95% confidence intervals.
Loess: confidence intervals can be obtained in closed form or using bootstrapping
(CL.BT=TRUE uses 2000 bootstrap samples).
RCS: 3 to 5 knots; knot locations estimated via default quantiles of the
predictor (by rcspline.eval).
Plot customization through standard plot arguments (cex.axis, etc.);
legend size controlled via cex.leg.
Label y-axis: "Observed proportion".
Added the Estimated Calibration Index (ECI) to quantify lack of calibration (Van Hoorde et al., 2015).
By default shows the "abc" of model performance: calibration intercept, calibration slope, and c-statistic (Steyerberg et al., 2011).
Vectors p, y and logit no longer have to be sorted.
A ggplot2-based equivalent, valProbggplot, was subsequently added, offering
the same functionality with ggplot2 graphics.
In 2023, Bavo De Cock (Campo) introduced the generalized calibration framework
(De Cock Campo, 2023), extending logistic
calibration to prediction models with outcomes from any distribution in the exponential
family, implemented in genCalCurve.
Support for survival models was added via valProbSurvival, enabling
calibration assessment of Cox proportional hazards model predictions at a specified time
horizon.
In 2025, methods for clustered data were introduced
(Barreñada et al., 2025), accessible through
valProbCluster, which supports CGC, MAC2, and MIXC approaches.
The most current version of this package can be found on https://github.com/BavoDC/CalibrationCurves.
Barreñada, L., De Cock Campo, B., Wynants, L., Van Calster, B. (2025). Clustered Flexible Calibration Plots for Binary Outcomes Using Random Effects Modeling. arXiv:2503.08389, available at https://arxiv.org/abs/2503.08389.
De Cock Campo, B. (2023). Towards reliable predictive analytics: a generalized calibration framework. arXiv:2309.08559, available at https://arxiv.org/abs/2309.08559.
Steyerberg, E.W., Van Calster, B., Pencina, M.J. (2011). Performance measures for prediction models and markers: evaluation of predictions and classifications. Revista Espanola de Cardiologia, 64(9), pp. 788-794
Van Calster, B., Nieboer, D., Vergouwe, Y., De Cock, B., Pencina M., Steyerberg E.W. (2016). A calibration hierarchy for risk models was defined: from utopia to empirical data. Journal of Clinical Epidemiology, 74, pp. 167-176
Van Hoorde, K., Van Huffel, S., Timmerman, D., Bourne, T., Van Calster, B. (2015). A spline-based tool to assess and visualize the calibration of multiclass risk predictions. Journal of Biomedical Informatics, 54, pp. 283-93
van Geloven, N., Giardiello, D., Bonneville, E.F., Teece, L., Ramspek, C.L., van Smeden, M. et al. (2022). Validation of prediction models in the presence of competing risks: a guide through modern methods. BMJ, 377:e069249, doi:10.1136/bmj-2021-069249
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.