predictr: Facilitates development and validation of predictive models

Vignette: Diagnostic / Prognostic Accuracy

Firstly let’s re-load our example data (the survival::colon)

data <- tibble::as_tibble(survival::colon) %>%
  dplyr::filter(etype==2) %>% # Outcome of interest is death
  dplyr::filter(rx!="Obs") %>%  # rx will be our binary treatment variable
  dplyr::select(-etype,-study, -status) %>% # Remove superfluous variables

  # Convert into numeric and factor variables
  dplyr::mutate_at(vars(obstruct, perfor, adhere, node4), function(x){factor(x, levels=c(0,1), labels = c("No", "Yes"))}) %>%
  dplyr::mutate(rx = factor(rx),
                mort365 = cut(time, breaks = c(-Inf, 365, Inf), labels = c("Yes", "No")),
                mort365 = factor(mort365, levels = c("No", "Yes")),
                sex = factor(sex, levels=c(0,1), labels = c("Female", "Male")),
                differ = factor(differ, levels = c(1,2,3), labels = c("Well", "Moderate", "Poor")),
                extent = factor(extent, levels = c(1,2,3, 4), labels = c("Submucosa", "Muscle", "Serosa", "Contiguous Structures")),
                surg = factor(surg, levels = c(0,1), labels = c("Short", "Long")))

Now lets create our model fit (fit1) and derive our predictions based on our model (predict1).

We are starting with a very simple model trying to predict deatn at 1 year (“mort365”) using just 1 variable (“rx” aka treatment receieved).

fit1 <- finalfit::glmmulti(data, dependent = "mort365", explanatory = c("rx"))

predict1 <- predictr(data = data, fit = fit1)

One of the most common ways to visualise the performance of the model is using Receiver Operating Characteristic (ROC) curves. This involves plotting the sensitivity and specificity of the model against one another.

There are 2 steps required to make these using predictr:

1). `roc_plot_format()`

This function takes the predictions from the predictr() function output, and provides a tibble of the sensitivity and specificity of the model required for plotting.

roc_format1 <- predict1 %>%
  roc_plot_format(event = "event", predict = "predict_prop",confint = T, smooth=T)

##   |                                                              |                                                      |   0%  |                                                              |                                                      |   1%  |                                                              |=                                                     |   1%  |                                                              |=                                                     |   2%  |                                                              |=                                                     |   3%  |                                                              |==                                                    |   3%  |                                                              |==                                                    |   4%  |                                                              |==                                                    |   5%  |                                                              |===                                                   |   5%  |                                                              |===                                                   |   6%  |                                                              |====                                                  |   6%  |                                                              |====                                                  |   7%  |                                                              |====                                                  |   8%  |                                                              |=====                                                 |   8%  |                                                              |=====                                                 |   9%  |                                                              |=====                                                 |  10%  |                                                              |======                                                |  10%  |                                                              |======                                                |  11%  |                                                              |======                                                |  12%  |                                                              |=======                                               |  12%  |                                                              |=======                                               |  13%  |                                                              |=======                                               |  14%  |                                                              |========                                              |  14%  |                                                              |========                                              |  15%  |                                                              |========                                              |  16%  |                                                              |=========                                             |  16%  |                                                              |=========                                             |  17%  |                                                              |=========                                             |  18%  |                                                              |==========                                            |  18%  |                                                              |==========                                            |  19%  |                                                              |===========                                           |  19%  |                                                              |===========                                           |  20%  |                                                              |===========                                           |  21%  |                                                              |============                                          |  21%  |                                                              |============                                          |  22%  |                                                              |============                                          |  23%  |                                                              |=============                                         |  23%  |                                                              |=============                                         |  24%  |                                                              |=============                                         |  25%  |                                                              |==============                                        |  25%  |                                                              |==============                                        |  26%  |                                                              |==============                                        |  27%  |                                                              |===============                                       |  27%  |                                                              |===============                                       |  28%  |                                                              |===============                                       |  29%  |                                                              |================                                      |  29%  |                                                              |================                                      |  30%  |                                                              |================                                      |  31%  |                                                              |=================                                     |  31%  |                                                              |=================                                     |  32%  |                                                              |==================                                    |  32%  |                                                              |==================                                    |  33%  |                                                              |==================                                    |  34%  |                                                              |===================                                   |  34%  |                                                              |===================                                   |  35%  |                                                              |===================                                   |  36%  |                                                              |====================                                  |  36%  |                                                              |====================                                  |  37%  |                                                              |====================                                  |  38%  |                                                              |=====================                                 |  38%  |                                                              |=====================                                 |  39%  |                                                              |=====================                                 |  40%  |                                                              |======================                                |  40%  |                                                              |======================                                |  41%  |                                                              |======================                                |  42%  |                                                              |=======================                               |  42%  |                                                              |=======================                               |  43%  |                                                              |=======================                               |  44%  |                                                              |========================                              |  44%  |                                                              |========================                              |  45%  |                                                              |=========================                             |  45%  |                                                              |=========================                             |  46%  |                                                              |=========================                             |  47%  |                                                              |==========================                            |  47%  |                                                              |==========================                            |  48%  |                                                              |==========================                            |  49%  |                                                              |===========================                           |  49%  |                                                              |===========================                           |  50%  |                                                              |===========================                           |  51%  |                                                              |============================                          |  51%  |                                                              |============================                          |  52%  |                                                              |============================                          |  53%  |                                                              |=============================                         |  53%  |                                                              |=============================                         |  54%  |                                                              |=============================                         |  55%  |                                                              |==============================                        |  55%  |                                                              |==============================                        |  56%  |                                                              |===============================                       |  56%  |                                                              |===============================                       |  57%  |                                                              |===============================                       |  58%  |                                                              |================================                      |  58%  |                                                              |================================                      |  59%  |                                                              |================================                      |  60%  |                                                              |=================================                     |  60%  |                                                              |=================================                     |  61%  |                                                              |=================================                     |  62%  |                                                              |==================================                    |  62%  |                                                              |==================================                    |  63%  |                                                              |==================================                    |  64%  |                                                              |===================================                   |  64%  |                                                              |===================================                   |  65%  |                                                              |===================================                   |  66%  |                                                              |====================================                  |  66%  |                                                              |====================================                  |  67%  |                                                              |====================================                  |  68%  |                                                              |=====================================                 |  68%  |                                                              |=====================================                 |  69%  |                                                              |======================================                |  69%  |                                                              |======================================                |  70%  |                                                              |======================================                |  71%  |                                                              |=======================================               |  71%  |                                                              |=======================================               |  72%  |                                                              |=======================================               |  73%  |                                                              |========================================              |  73%  |                                                              |========================================              |  74%  |                                                              |========================================              |  75%  |                                                              |=========================================             |  75%  |                                                              |=========================================             |  76%  |                                                              |=========================================             |  77%  |                                                              |==========================================            |  77%  |                                                              |==========================================            |  78%  |                                                              |==========================================            |  79%  |                                                              |===========================================           |  79%  |                                                              |===========================================           |  80%  |                                                              |===========================================           |  81%  |                                                              |============================================          |  81%  |                                                              |============================================          |  82%  |                                                              |=============================================         |  82%  |                                                              |=============================================         |  83%  |                                                              |=============================================         |  84%  |                                                              |==============================================        |  84%  |                                                              |==============================================        |  85%  |                                                              |==============================================        |  86%  |                                                              |===============================================       |  86%  |                                                              |===============================================       |  87%  |                                                              |===============================================       |  88%  |                                                              |================================================      |  88%  |                                                              |================================================      |  89%  |                                                              |================================================      |  90%  |                                                              |=================================================     |  90%  |                                                              |=================================================     |  91%  |                                                              |=================================================     |  92%  |                                                              |==================================================    |  92%  |                                                              |==================================================    |  93%  |                                                              |==================================================    |  94%  |                                                              |===================================================   |  94%  |                                                              |===================================================   |  95%  |                                                              |====================================================  |  95%  |                                                              |====================================================  |  96%  |                                                              |====================================================  |  97%  |                                                              |===================================================== |  97%  |                                                              |===================================================== |  98%  |                                                              |===================================================== |  99%  |                                                              |======================================================|  99%  |                                                              |======================================================| 100%

roc_format1 %>% head(10) %>%   knitr::kable()

spe sen\_est sen\_lci sen\_uci 0 1.0000000 1.0000000 1.0000000 0.01 0.9906903 0.9877996 0.9934219 0.02 0.9813805 0.9755991 0.9868438 0.03 0.9720708 0.9633987 0.9802657 0.04 0.9627611 0.9511983 0.9736877 0.05 0.9534514 0.9389978 0.9671096 0.06 0.9441416 0.9267974 0.9605315 0.07 0.9348319 0.9145969 0.9539534 0.08 0.9255222 0.9023965 0.9473753 0.09 0.9162125 0.8901961 0.9407972

There are two ways you can modify roc_plot_format()

You can calculate confidence intervals for the curve (confint = TRUE), however please note this can take a while to calculate.
You can also apply smoothing to the curve data (smooth = TRUE) to remove any irregularities in the curve (this will only work if there’s sufficent data in the model)

2). `roc_plot()`

Now we have the outputted dataframe from roc_plot_format() (aka roc_format1), we can then plot this as a ROC curve in ggplot.

In this context, we can see the model is poor at predicting the likelihood of death at 1 year (barely better than flipping a coin!)

roc_format1 %>%
  roc_plot()

However, we can also quantify this assessment numerically Using the roc_metric() function you can get a variety of the most common metrics used to assess the model discrimination / accuracy.

predict1 %>%
  roc_metric(event  = "event", predict = "predict_prop") %>% knitr::kable()

## Setting levels: control = No, case = Yes

model name estimate lci uci metric 1 Cutoff 0.088 NA NA 0.088 1 AUC 0.518 0.447 0.588 0.518 (95% CI: 0.447 to 0.588) 1 Sensitivity 0.537 0.404 0.670 0.537 (95% CI: 0.404 to 0.670) 1 Specificity 0.498 0.457 0.540 0.498 (95% CI: 0.457 to 0.540) 1 Positive Likelihood Ratio 1.070 0.824 1.389 1.070 (95% CI: 0.824 to 1.389) 1 Negative Likelihood Ratio 0.929 0.689 1.253 0.929 (95% CI: 0.689 to 1.253) 1 Positive Predictive Value (PPV) 0.094 0.061 0.126 0.094 (95% CI: 0.061 to 0.126) 1 Negative Predictive Value (NPV) 0.918 0.887 0.949 0.918 (95% CI: 0.887 to 0.949)

However, if you already have a predefined cut-off for risk classification you may want to test the discrimination of that specific cutoff.

For example, you have predefined that patients with a predicted risk of death >=9% would be considered to be “high-risk” (while those with a predicted risk <9% would be considered “low risk”). We can supply a dichotomised variable instead to the roc_metric() function to get the metrics for this specifically.

Note that you cannot derive ROC curves for an already dichotomised variable (the prediction must be a continuous numerical value for that)

predict1 %>%
  dplyr::mutate(binary = ifelse(predict_prop>0.09, "High risk", "Low Risk")) %>%
  roc_metric(event  = "event", predict = "binary") %>% knitr::kable()

name estimate lci uci metric Predicted Prevalence 0.0879479 0.0667588 0.1131952 0.088 (95% CI: 0.067 to 0.113) True Prevalence 0.4951140 0.4548580 0.5354172 0.495 (95% CI: 0.455 to 0.535) Diagnostic Accuracy 0.4983713 0.4580979 0.5386605 0.498 (95% CI: 0.458 to 0.539) Sensitivity 0.0822368 0.0539254 0.1190046 0.082 (95% CI: 0.054 to 0.119) Specificity 0.9064516 0.8684150 0.9364518 0.906 (95% CI: 0.868 to 0.936) Positive Likelihood Ratio 0.8790835 0.5273719 1.4653563 0.879 (95% CI: 0.527 to 1.465) Negative Likelihood Ratio 1.0124789 0.9639635 1.0634361 1.012 (95% CI: 0.964 to 1.063) Positive Predictive Value (PPV) 0.4629630 0.3262246 0.6039050 0.463 (95% CI: 0.326 to 0.604) Negative Predictive Value (NPV) 0.5017857 0.4595658 0.5439867 0.502 (95% CI: 0.460 to 0.544)

Now let’s say instead of just 1 model, you have multiple models you want to compare using the same data to determine which one is best in your context.

Let’s generate these models and then use predictr() to predict. These have been combined into one dataset (multiple), with the “model” column informing on which predictions belong to which model.

fit2 <- finalfit::glmmulti(data, dependent = "mort365", explanatory = c("rx", "sex"))
fit3 <- finalfit::glmmulti(data, dependent = "mort365", explanatory = c("rx", "sex","obstruct"))
fit4 <- finalfit::glmmulti(data, dependent = "mort365", explanatory = c("rx", "sex","obstruct", "differ"))

multiple <- list(fit1, fit2, fit3, fit4) %>%
  purrr::map_dfr(function(x){predictr(data = data, fit = x)}, .id="model") %>%
  dplyr::mutate(model = factor(model, levels = c(1:4),
                               labels = c("Rx", "Rx & Sex", "Rx, Sex, & Obstruct", "Rx, Sex, Obstruct, & Differ")))

multiple %>% head(10) %>% knitr::kable()

model rowid mort365 rx event predict\_raw predict\_prop sex obstruct differ Rx 1 No Lev+5FU No -2.412336 0.0822368 NA NA NA Rx 2 No Lev+5FU No -2.412336 0.0822368 NA NA NA Rx 3 Yes Lev+5FU Yes -2.412336 0.0822368 NA NA NA Rx 4 No Lev+5FU No -2.412336 0.0822368 NA NA NA Rx 5 No Lev No -2.271059 0.0935484 NA NA NA Rx 6 No Lev No -2.271059 0.0935484 NA NA NA Rx 7 No Lev+5FU No -2.412336 0.0822368 NA NA NA Rx 8 No Lev No -2.271059 0.0935484 NA NA NA Rx 9 No Lev+5FU No -2.412336 0.0822368 NA NA NA Rx 10 No Lev No -2.271059 0.0935484 NA NA NA

Now we’re set up for comparing the accuracy across all the models. This is handled automatically by predictr if we tell it which column (“model”) has the information on which model the prediction is from.

Please note if you forget to specify “model” when there’s multiple models in a dataset, the function will assume all predictions come from a single model.

multiple %>%
  roc_metric(model = "model", event  = "event", predict = "predict_prop") %>%
  tidyr::pivot_wider(id_cols = c("name"), names_from = "model", values_from = "metric") %>%
  knitr::kable()

## Setting levels: control = No, case = Yes
## Setting levels: control = No, case = Yes
## Setting levels: control = No, case = Yes
## Setting levels: control = No, case = Yes

name Rx Rx & Sex Rx, Sex, & Obstruct Rx, Sex, Obstruct, & Differ Cutoff 0.088 0.082 0.081 0.090 AUC 0.518 (95% CI: 0.447 to 0.588) 0.518 (95% CI: 0.445 to 0.591) 0.583 (95% CI: 0.501 to 0.664) 0.669 (95% CI: 0.587 to 0.751) Sensitivity 0.537 (95% CI: 0.404 to 0.670) 0.833 (95% CI: 0.734 to 0.933) 0.574 (95% CI: 0.442 to 0.706) 0.574 (95% CI: 0.442 to 0.706) Specificity 0.498 (95% CI: 0.457 to 0.540) 0.236 (95% CI: 0.201 to 0.271) 0.591 (95% CI: 0.550 to 0.632) 0.728 (95% CI: 0.691 to 0.765) Positive Likelihood Ratio 1.070 (95% CI: 0.824 to 1.389) 1.090 (95% CI: 0.959 to 1.239) 1.404 (95% CI: 1.093 to 1.803) 2.110 (95% CI: 1.614 to 2.758) Negative Likelihood Ratio 0.929 (95% CI: 0.689 to 1.253) 0.707 (95% CI: 0.382 to 1.308) 0.721 (95% CI: 0.525 to 0.990) 0.585 (95% CI: 0.427 to 0.801) Positive Predictive Value (PPV) 0.094 (95% CI: 0.061 to 0.126) 0.095 (95% CI: 0.069 to 0.122) 0.119 (95% CI: 0.080 to 0.159) 0.173 (95% CI: 0.118 to 0.229) Negative Predictive Value (NPV) 0.918 (95% CI: 0.887 to 0.949) 0.936 (95% CI: 0.896 to 0.977) 0.935 (95% CI: 0.909 to 0.961) 0.945 (95% CI: 0.923 to 0.967)

 multiple %>%
  roc_plot_format(model = "model", event = "event", predict = "predict_prop",confint = F, smooth = F) %>%
  roc_plot()

From above, we can see that as the number of variables added increases, there is improved accuracy of the model for predicting death at 1 year.

However, note the substantial changes in the sensitivity / specificity as the model changes.

kamclean/predictr documentation built on Aug. 14, 2022, 4:35 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

kamclean/predictr
Facilitates development and validation of predictive models

vignettes/vignette_2_accuracy.md
In kamclean/predictr: Facilitates development and validation of predictive models

Vignette: Diagnostic / Prognostic Accuracy

Load data

Comparing accuracy across a predictive model

Receiver Operating Characteristic (ROC) curves

1). `roc_plot_format()`

2). `roc_plot()`

Accuracy Metrics

Comparing accuracy across multiple models

R Package Documentation

Browse R Packages

We want your feedback!

kamclean/predictr Facilitates development and validation of predictive models

vignettes/vignette_2_accuracy.md In kamclean/predictr: Facilitates development and validation of predictive models

Vignette: Diagnostic / Prognostic Accuracy

Load data

Comparing accuracy across a predictive model

Receiver Operating Characteristic (ROC) curves

1). roc_plot_format()

2). roc_plot()

Accuracy Metrics

Comparing accuracy across multiple models

R Package Documentation

Browse R Packages

We want your feedback!

kamclean/predictr
Facilitates development and validation of predictive models

vignettes/vignette_2_accuracy.md
In kamclean/predictr: Facilitates development and validation of predictive models

1). `roc_plot_format()`

2). `roc_plot()`