In mdbrown/rmda: Risk Model Decision Analysis

The package rmda (risk model decision analysis) provides tools to evaluate the value of using a risk prediction instrument to decide treatment or intervention (versus no treatment or intervention). Given one or more risk prediction instruments (risk models) that estimate the probability of a binary outcome, rmda provides functions to estimate and display decision curves and other figures that help assess the population impact of using a risk model for clinical decision making. Here, “population” refers to the relevant patient population.

Decision curves display estimates of the (standardized) net benefit over a range of probability thresholds used to categorize observations as 'high risk.' The curves help evaluate a treatment policy that recommends treatment for patients who are estimated to be ‘high risk’ by comparing the population impact of a risk-based policy to “treat all” and “treat none” intervention policies. Curves can be estimated using data from a prospective cohort. In addition, rmda can estimate decision curves using data from a case-control study if an estimate of the population outcome prevalence is available. Version 1.5 and higher of the package provides an alternative framing of the decision problem for situations where treatment is the standard-of-care and a risk model might be used to recommend that low-risk patients (i.e., patients below some risk threshold) opt-out of treatment.

Confidence intervals calculated using the bootstrap can be computed and displayed. A wrapper function to calculate cross-validated curves using k-fold cross-validation is also provided.

Key functions are:

decision_curve: Estimate (standardized) net benefit curves with bootstrap confidence intervals.
plot_decision_curve: Plot a decision curve or multiple curves.
plot_clinical_impact and plot_roc_components: Alternative plots for the output of decision_curve showing measures of clinical impact or the components of the ROC curve (true/false positive rates) across a range of risk thresholds. See help files or tutorial for more info.
cv_decision_curve: Calculate k-fold cross-validated estimates of a decision curve and its components.

Install the package

The easiest way to get the package is directly from CRAN:

install.packages("rmda")

You may also download the current version of the package here:

https://github.com/mdbrown/rmda/releases

navigate to the source package and use

install.packages("../rmda_1.5.tar.gz", repos = NULL, type = "source")

or install the package directly from github using devtools.

## install.packages("devtools")
library(devtools)
install_github("mdbrown/rmda")

Getting started

Load the package and the provided simulated data set.

library(rmda)

#load simulated data 
data(dcaData)

head(dcaData)

First we use the function decision_curve to create a decision curve object for a logistic model to predict cancer status using age, gender and smoking status. We then plot it using plot_decision_curve.

set.seed(123)
#first use rmda with the default settings (set bootstraps = 50 here to reduce computation time). 
baseline.model <- decision_curve( Cancer~Age + Female + Smokes, #fitting a logistic model
                                data = dcaData, 
                                study.design = "cohort", 
                                policy = "opt-in",  #default 
                                bootstraps = 50)

#plot the curve
plot_decision_curve(baseline.model,  curve.names = "baseline model")

Next, we create a decision curve with two markers added to the original baseline model. We then pass a list with both decision curves to plot_decision_curve to plot both curves.

set.seed(123)
full.model <- decision_curve(Cancer~Age + Female + Smokes + Marker1 + Marker2,
              data = dcaData, 
              bootstraps = 50)

#since we want to plot more than one curve, we pass a list of 'decision_curve' objects to the plot
plot_decision_curve( list(baseline.model, full.model), 
                   curve.names = c("Baseline model", "Full model"), xlim = c(0, 1), legend.position = "bottomright") 

#see all available options using 
?decision_curve
?plot_decision_curve

Alternative plotting functions

We include two other functions plot_roc_components and plot_clinical_impact.

plot_roc_components plots the components of the ROC curve-true positive rate and false positive rates-over a range of high risk thresholds.

If we were to use the specified model to classify 1,000 hypothetical subjects, plot_clinical_impact plots the number classified as high risk and the number with the outcome classified as high risk for a given high risk threshold.

#plot the components of the roc curve--true positive rate and false positive rate
plot_roc_components(full.model,  xlim = c(0, 0.4), 
                    col = c("black", "red"))
#plot the clinical impact 
plot_clinical_impact(full.model, xlim = c(0, .4), 
                     col = c("black", "blue"))

The net benefit of an 'opt-out' vs. 'opt-in' policy

For many clinical contexts, a risk model is used in one of two ways. In the first setting, the standard-of-care for a population is to assign a particular 'treatment' to no one. Clinicians then use a risk model to categorize patients as 'high-risk', with the recommendation to treat high-risk patients with some intervention. We refer to this use of a risk model as an 'opt-in' policy. The relative costs and benefits of such a policy are specified as the 'cost' of treating an individual without the event (a control) compared to the 'benefit' of treating an observation with the event (a case).

The standardized net benefit for an 'opt-in' policy is defined in terms of the true and false positive rates, the outcome prevalence $\rho$, and the risk threshold $r$.

$$ sNB^{(opt-in)}(r) = TPR(r) - \Big(\frac{1-\rho}{\rho}\Big)\Big(\frac{r}{1-r}\Big)FPR(r) $$ Alternatively, there are contexts where the standard-of-care is to recommend a treatment to an entire patient population. The potential use of a risk model in this setting is to identify patients who are 'low-risk' and recommend that those patients 'opt-out' of treatment.

The standardized net benefit for an 'opt-out' policy is defined in terms of the true and false negative rates, the outcome prevalence \rho, and the risk threshold $r$:

$$ sNB^{(opt-out)}(r) =TNR(r) - \Big(\frac{\rho}{ 1-\rho} \Big)\Big(\frac{1-r}{r}\Big)FNR(r) $$

So far in this tutorial we have calculated the net benefit curves using the default policy = 'opt-in' setting. To calculate the net benefit for an 'opt-out' policy, we set policy = 'opt-out' when using the decision_curve function.

set.seed(123)
opt.out.dc <- decision_curve(Cancer~Age + Female + Smokes + Marker1 + Marker2,
              data = dcaData, 
              bootstraps = 50, 
              policy = 'opt-out') #set policy = 'opt-out' (default is 'opt-in')


plot_decision_curve( opt.out.dc,  xlim = c(0, 1), ylim = c(-.2,2), 
                     standardize = FALSE, 
                     curve.names = "model", legend = 'bottomright')

The clinical impact plots are also influenced by the policy argument. When we set policy = 'opt-out' the clinical impact plot displays the number considered low risk at each threshold instead of the number high risk.

plot_clinical_impact( opt.out.dc, col = c("black", "blue"), legend = 'bottomright')

Tweaking the defaults

We show several examples of how one might change the default settings.

Fine tune the thresholds, move the legend, and change linewidth and colors. Here we are calculating many more points on the curve (see the 'thresholds' setting).

baseline.model <- decision_curve(Cancer~Age + Female + Smokes,
                                data = dcaData, 
                                 thresholds = seq(0, .4, by = .001),# calculate thresholds from 0-0.4 at every 0.001 increment. 
                                bootstraps = 25)

full.model <- decision_curve(Cancer~Age + Female + Smokes + Marker1 + Marker2,
                            data = dcaData, 
                            thresholds = seq(0, .4, by = .001),# calculate thresholds from 0-0.4 at every 0.001 increment. 
                            bootstraps = 25)


plot_decision_curve( list(baseline.model, full.model), 
                   curve.names = c("Baseline model", "Full model"),
                   col = c("blue", "red"), 
                   lty = c(1,2), 
                   lwd = c(3,2, 2, 1),  # the first two correspond to the decision curves, then 'all' and then 'none' 
                   legend.position = "bottomright") #adjust the legend position

No confidence intervals, cost:benefit ratio axis, or legend

plot_decision_curve( list(baseline.model, full.model), 
                   curve.names = c("Baseline model", "Full model"),
                   col = c("blue", "red"), 
                   confidence.intervals = FALSE,  #remove confidence intervals
                   cost.benefit.axis = FALSE, #remove cost benefit axis
                   legend.position = "none") #remove the legend

Print specific cost:benefit ratios.

plot_decision_curve( list(baseline.model, full.model), 
                   curve.names = c("Baseline model", "Full model"),
                   col = c("blue", "red"), 
                  cost.benefits = c("1:1000", "1:4", "1:9", "2:3", "1:3"),  #set specific cost benefits
                   legend.position = "bottomright")

Plot net benefit instead of standardize net benefit, change confidence interval level.

baseline.model <- decision_curve(Cancer~Age + Female + Smokes,
                                data = dcaData, 
                                thresholds = seq(0, .4, by = .01),
                                confidence.intervals = 0.9, #calculate 90% confidence intervals
                                bootstraps = 25)

full.model <- decision_curve(Cancer~Age + Female + Smokes + Marker1 + Marker2,
                            data = dcaData, 
                            thresholds = seq(0, .40, by = .01),
                            confidence.intervals = 0.9, #calculate 90% confidence intervals
                            bootstraps = 25)

plot_decision_curve( list(baseline.model, full.model), 
                   curve.names = c("Baseline model", "Full model"),
                   col = c("blue", "red"), 
                   ylim = c(-0.05, 0.15), #set ylim
                   lty = c(2,1), 
                   standardize = FALSE, #plot Net benefit instead of standardized net benefit
                   legend.position = "topright")

Providing fitted risks from a previously specified model

If a risk model has already been specified, and so no model fitting is needed, the user can specify fitted.risks=TRUE and provide them in the formula. No model fitting will be done and bootstrap confidence intervals will be conditional on the fitted model.

#helper function

expit <- function(xx) exp(xx)/ (1+exp(xx))
# Assume we have access to previously published models 
# (or models built using a training set)
# that we can use to predict the risk of cancer. 

# Basic model using demographic variables: Age, Female, Smokes. 
dcaData$BasicModel <- with(dcaData, expit(-7.3 + 0.18*Age - 0.08*Female + 0.80*Smokes ) )

# Model using demographic + markers : Age, Female, Smokes, Marker1 and Marker2. 
dcaData$FullModel <- with(dcaData, expit(-10.5 + 0.22*Age  - 0.01*Female + 0.91*Smokes + 2.03*Marker1 - 1.56*Marker2))


full.model <- decision_curve(Cancer~FullModel,
                            data = dcaData,
                            fitted.risk = TRUE, 
                            thresholds = seq(0, .4, by = .05),
                            bootstraps = 25) 


plot_decision_curve(full.model, legend.position = "none")

Printing estimates

decision_curve also outputs all calculations and point estimates when assigned to an object.

full.model <- decision_curve(Cancer~Age + Female + Smokes + Marker1 + Marker2,
                            data = dcaData, 
                            thresholds = seq(.01, .99, by = .05),
                            bootstraps = 25, policy = "opt-out")

The best way to look at point estimates (w/ci's) is to use summary.

summary(full.model, measure = "NB") #outputs standardized net benefit by default

you can also choose which measure to report, and how many decimal places to print. The options for the measure argument are 'TPR', 'FPR', 'sNB', and 'NB'.

#output true positive rate
summary(full.model, nround = 2, measure = "TPR")

You can also access the data directly. full.model$derived.data holds the net benefit and it's components.

head(full.model$derived.data)

Case-control data

If data is from a case-control study instead of an observational cohort, an estimate of the population level outcome prevalence is needed to calculate decision curves. Decision curves can be calculated by setting study.design = "case-control" and setting the population.prevalence. Once the decision_curve is calculated, all other calls to the plot functions remain the same. Note that bootstrap sampling is done stratified by outcome status for these data.

#simulated case-control data with same variables as above
data(dcaData_cc)

#estimated from the population where the 
#case-control sample comes from. 
population.rho = 0.11

baseline.model_cc <- decision_curve(Cancer~Age + Female + Smokes,
                                    data = dcaData_cc, 
                                    thresholds = seq(0, .4, by = .01),
                                    bootstraps = 25, 
                                    study.design = "case-control", 
                                    population.prevalence = population.rho)

full.model_cc <- decision_curve(Cancer~Age + Female + Smokes + Marker1 + Marker2,
                                data = dcaData_cc, 
                                thresholds = seq(0, .4, by = .01),
                                bootstraps = 25, 
                                study.design = "case-control", 
                                population.prevalence = population.rho)


plot_decision_curve( list(baseline.model_cc, full.model_cc), 
                   curve.names = c("Baseline model", "Full model"),
                   col = c("blue", "red"), 
                   lty = c(1,2), 
                   lwd = c(3,2, 2, 1), 
                   legend.position = "bottomright")

Cross-validation

We provide a wrapper to perform k-fold cross-validation to obtain bias corrected decision curves. Once cv_decision_curve is called, all plot and summary functions work the same as shown above for decision_curve output. Confidence interval calculation is not available at this time for cross-validated curves.

full.model_cv <- cv_decision_curve(Cancer~Age + Female + Smokes + Marker1 + Marker2,
                                    data = dcaData,
                                    folds = 5, 
                                    thresholds = seq(0, .4, by = .01), 
                                   policy = "opt-out")

full.model_apparent <- decision_curve(Cancer~Age + Female + Smokes + Marker1 + Marker2,
                                data = dcaData, 
                                thresholds = seq(0, .4, by = .01),
                                confidence.intervals = 'none', policy = "opt-out")


plot_decision_curve( list(full.model_apparent, full.model_cv), 
                   curve.names = c("Apparent curve", "Cross-validated curve"),
                   col = c("red", "blue"), 
                   lty = c(2,1), 
                   lwd = c(3,2, 2, 1), 
                   legend.position = "bottomright") 

plot_clinical_impact(full.model_cv, legend.position = "bottomright")

References

Vickers AJ, Elkin EB. Decision curve analysis: a novel method for evaluating prediction models. Medical Decision Making. 2006 Nov-Dec;26(6):565-74.

Kerr KF, Brown MD, Zhu K, Janes H. Assessing the Clinical Impact of Risk Prediction Models With Decision Curves: Guidance for Correct Interpretation and Appropriate Use. J Clin Oncol. 2016; 34(21):2534-40.

mdbrown/rmda documentation built on May 30, 2019, 6:19 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com