predictr: Facilitates development and validation of predictive models

Vignette: Evaluating Model Calibration

Firstly let’s re-load our example data (the survival::colon)

data <- tibble::as_tibble(survival::colon) %>%
  dplyr::filter(etype==2) %>% # Outcome of interest is death
  dplyr::filter(rx!="Obs") %>%  # rx will be our binary treatment variable
  dplyr::select(-etype,-study, -status) %>% # Remove superfluous variables

  # Convert into numeric and factor variables
  dplyr::mutate_at(vars(obstruct, perfor, adhere, node4), function(x){factor(x, levels=c(0,1), labels = c("No", "Yes"))}) %>%
  dplyr::mutate(rx = factor(rx),
                mort365 = cut(time, breaks = c(-Inf, 365, Inf), labels = c("Yes", "No")),
                mort365 = factor(mort365, levels = c("No", "Yes")),
                sex = factor(sex, levels=c(0,1), labels = c("Female", "Male")),
                differ = factor(differ, levels = c(1,2,3), labels = c("Well", "Moderate", "Poor")),
                extent = factor(extent, levels = c(1,2,3, 4), labels = c("Submucosa", "Muscle", "Serosa", "Contiguous Structures")),
                surg = factor(surg, levels = c(0,1), labels = c("Short", "Long")))

Now lets create our model fit (fit).

fit <- finalfit::glmmulti(data, dependent = "mort365",
                          explanatory = c("rx", "sex","obstruct", "differ", "adhere", "perfor", "extent", "nodes", "surg"))

Traditionally, calibration would be assessed with plots of the proportion of observed events (y-axis) within specified risk groups (x-axis). This might be within quantiles (the sample equally divided into X groups) or risk classifcations (specifiying the exact splits between groups).

The cal_plot() function allows both to be done to allow visualisation of the data.

cal_plot(fit = fit, risk_ntile = 5) + ggtitle("A. Risk Quantile") + cal_plot(predictr = data %>% predictr(fit = fit), risk_class = c(0.05, 0.06, 0.1, 0.2)) + ggtitle("B. Risk Classification")

The categoried data can also be displayed in tables using the cal_table() function:

table <- data %>%
  predictr(fit = fit) %>%
  cal_table(risk_ntile = 5, risk_class = c(0.05, 0.06, 0.1, 0.2))

table$ntile %>% knitr::kable()

ntile n event prop pred\_med pred\_min pred\_max 1 117 2 0.017094 0.0215451 0.0000000 0.0415383 2 117 5 0.042735 0.0464945 0.0415383 0.0533199 3 117 4 0.034188 0.0618531 0.0533199 0.0760904 4 116 13 0.112069 0.0968864 0.0770344 0.1313318 5 116 29 0.250000 0.1871614 0.1340965 0.5854772

table$class %>% knitr::kable()

class n event prop pred\_med pred\_min pred\_max (0,0.05\] 193 7 0.0362694 0.0407751 0.0000000 0.0497845 (0.05,0.06\] 90 2 0.0222222 0.0533320 0.0504214 0.0596213 (0.06,0.1\] 135 9 0.0666667 0.0760904 0.0603197 0.0995768 (0.1,0.2\] 114 20 0.1754386 0.1415624 0.1011512 0.1994178 (0.2,1\] 51 15 0.2941176 0.2730587 0.2012504 0.5854772

Traditionally calibration has been assessed via a Hosmer–Lemeshow (HL) test to determine whether there is a perfect linear relationship between the observed and predicted risk. The sample is divided into quantiles of predicted risk (typically deciles).

However, this has many issues (oversensitive in large samples, losing nuance / power by categorisation, etc).

Using the cal_metric() function, we find the Hosmer–Lemeshow (HL) test p-value is:

cal_metric(fit = fit, risk_ntile = 10, hltest = T) %>% dplyr::pull(hl)

## [1] "0.335"

This suggests there is no statistical evidence of poor calibration across the risk deciles.

However, while more understandable, this form of risk categorisation can be arbitrary and loses nuances in the relationship.

Increasingly, calibration has been recommended to be assessed with the predicted event rate as a continuous variable, plotted against the occurrence of events (0 = “No”, 1 = “Yes”). We then plot a Loess curve (line of best fit) to evaluate against a linear relationship, and look at the intercept/slope instead.

cal_plot(predictr = data %>% predictr(fit = fit), se=T)

We can assess the intercept and slope of the calibration curve using cal_metric(). A “perfect” calibration will have an intercept of 0 and a slope of 1.

For more information on how to interpret the results of these calibration curves, please see the useful paper Calibration: the Achilles heel of predictive analytics.

cal_metric(predictr = data %>% predictr(fit = fit)) %>% knitr::kable()

intercept slope 0.000 1.000

kamclean/predictr documentation built on Aug. 14, 2022, 4:35 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

kamclean/predictr
Facilitates development and validation of predictive models

vignettes/vignette_3_calibration.md
In kamclean/predictr: Facilitates development and validation of predictive models

Vignette: Evaluating Model Calibration

Load data

Evaluating Calibration

Calibration across Risk Groups

Calibration across Continuous Risk

R Package Documentation

Browse R Packages

We want your feedback!

kamclean/predictr Facilitates development and validation of predictive models

vignettes/vignette_3_calibration.md In kamclean/predictr: Facilitates development and validation of predictive models

Vignette: Evaluating Model Calibration

Load data

Evaluating Calibration

Calibration across Risk Groups

Calibration across Continuous Risk

R Package Documentation

Browse R Packages

We want your feedback!

kamclean/predictr
Facilitates development and validation of predictive models

vignettes/vignette_3_calibration.md
In kamclean/predictr: Facilitates development and validation of predictive models