vignettes/vignette_3_calibration.md

Vignette: Evaluating Model Calibration

Load data

Firstly let’s re-load our example data (the survival::colon)

data <- tibble::as_tibble(survival::colon) %>%
  dplyr::filter(etype==2) %>% # Outcome of interest is death
  dplyr::filter(rx!="Obs") %>%  # rx will be our binary treatment variable
  dplyr::select(-etype,-study, -status) %>% # Remove superfluous variables

  # Convert into numeric and factor variables
  dplyr::mutate_at(vars(obstruct, perfor, adhere, node4), function(x){factor(x, levels=c(0,1), labels = c("No", "Yes"))}) %>%
  dplyr::mutate(rx = factor(rx),
                mort365 = cut(time, breaks = c(-Inf, 365, Inf), labels = c("Yes", "No")),
                mort365 = factor(mort365, levels = c("No", "Yes")),
                sex = factor(sex, levels=c(0,1), labels = c("Female", "Male")),
                differ = factor(differ, levels = c(1,2,3), labels = c("Well", "Moderate", "Poor")),
                extent = factor(extent, levels = c(1,2,3, 4), labels = c("Submucosa", "Muscle", "Serosa", "Contiguous Structures")),
                surg = factor(surg, levels = c(0,1), labels = c("Short", "Long")))

Now lets create our model fit (fit).

fit <- finalfit::glmmulti(data, dependent = "mort365",
                          explanatory = c("rx", "sex","obstruct", "differ", "adhere", "perfor", "extent", "nodes", "surg"))

Evaluating Calibration

Calibration across Risk Groups

Traditionally, calibration would be assessed with plots of the proportion of observed events (y-axis) within specified risk groups (x-axis). This might be within quantiles (the sample equally divided into X groups) or risk classifcations (specifiying the exact splits between groups).

The cal_plot() function allows both to be done to allow visualisation of the data.

cal_plot(fit = fit, risk_ntile = 5) + ggtitle("A. Risk Quantile") + cal_plot(predictr = data %>% predictr(fit = fit), risk_class = c(0.05, 0.06, 0.1, 0.2)) + ggtitle("B. Risk Classification")

The categoried data can also be displayed in tables using the cal_table() function:

table <- data %>%
  predictr(fit = fit) %>%
  cal_table(risk_ntile = 5, risk_class = c(0.05, 0.06, 0.1, 0.2))

table$ntile %>% knitr::kable()
ntile n event prop pred\_med pred\_min pred\_max 1 117 2 0.017094 0.0215451 0.0000000 0.0415383 2 117 5 0.042735 0.0464945 0.0415383 0.0533199 3 117 4 0.034188 0.0618531 0.0533199 0.0760904 4 116 13 0.112069 0.0968864 0.0770344 0.1313318 5 116 29 0.250000 0.1871614 0.1340965 0.5854772
table$class %>% knitr::kable()
class n event prop pred\_med pred\_min pred\_max (0,0.05\] 193 7 0.0362694 0.0407751 0.0000000 0.0497845 (0.05,0.06\] 90 2 0.0222222 0.0533320 0.0504214 0.0596213 (0.06,0.1\] 135 9 0.0666667 0.0760904 0.0603197 0.0995768 (0.1,0.2\] 114 20 0.1754386 0.1415624 0.1011512 0.1994178 (0.2,1\] 51 15 0.2941176 0.2730587 0.2012504 0.5854772

Traditionally calibration has been assessed via a Hosmer–Lemeshow (HL) test to determine whether there is a perfect linear relationship between the observed and predicted risk. The sample is divided into quantiles of predicted risk (typically deciles).

Using the cal_metric() function, we find the Hosmer–Lemeshow (HL) test p-value is:

cal_metric(fit = fit, risk_ntile = 10, hltest = T) %>% dplyr::pull(hl)

## [1] "0.335"

This suggests there is no statistical evidence of poor calibration across the risk deciles.

Calibration across Continuous Risk

However, while more understandable, this form of risk categorisation can be arbitrary and loses nuances in the relationship.

Increasingly, calibration has been recommended to be assessed with the predicted event rate as a continuous variable, plotted against the occurrence of events (0 = “No”, 1 = “Yes”). We then plot a Loess curve (line of best fit) to evaluate against a linear relationship, and look at the intercept/slope instead.

cal_plot(predictr = data %>% predictr(fit = fit), se=T)

We can assess the intercept and slope of the calibration curve using cal_metric(). A “perfect” calibration will have an intercept of 0 and a slope of 1.

For more information on how to interpret the results of these calibration curves, please see the useful paper Calibration: the Achilles heel of predictive analytics.

cal_metric(predictr = data %>% predictr(fit = fit)) %>% knitr::kable()
intercept slope 0.000 1.000

kamclean/predictr documentation built on Aug. 14, 2022, 4:35 a.m.