DecisionCurve Tutorial (v.1.4)
In DecisionCurve: Calculate and Plot Decision Curves

Decision curves are a useful tool to evaluate the population impact of adopting a risk prediction instrument into clinical practice. Given one or more instruments (risk models) that predict the probability of an adverse binary outcome, this package calculates and plots decision curves, which display estimates of the standardized net benefit by the probability threshold used to categorize observations as 'high risk.' Version 1.4 of the package provides an alternative framing of the decision problem for situations where treatment is the standard-of-care and a risk model might be used in order for low-risk patients (i.e., patients below some risk threshold) to opt out of treatment.

Curves can be estimated using data from an observational cohort, or from case-control studies when an estimate of the population outcome prevalence is available. Confidence intervals calculated using the bootstrap can be displayed and a wrapper function to calculate cross-validated curves using k-fold cross-validation is also provided.

Key functions are:

decision_curve: Estimate (standardized) net benefit curves with bootstrap confidence intervals.
plot_decision_curve: Plot a decision curve or multiple curves.
plot_clinical_impact and plot_roc_components: Alternative plots for the output of decision_curve. See help files or tutorial for more info.
cv_decision_curve: Calculate k-fold cross-validated estimates of decision curves.

Install the package

The easiest way to get the package is directly from CRAN:

install.packages("DecisionCurve")

You may also download the current version of the package here:

https://github.com/mdbrown/DecisionCurve/releases

navigate to the source package and use

install.packages("../DecisionCurve_1.4.tar.gz", repos = NULL, type = "source")

or install the package directly from github using devtools.

## install.packages("devtools")
library(devtools)
install_github("mdbrown/DecisionCurve")

Getting started

Load the package and the provided simulated data set.

library(DecisionCurve)

#load simulated data 
data(dcaData)

head(dcaData)

First we use the function decision_curve to create a decision curve object for a logistic model to predict cancer status using age, gender and smoking status. We then plot it using plot_decision_curve.

set.seed(123)
#first use DecisionCurve with the default settings (set bootstraps = 50 here to reduce computation time). 
baseline.model <- decision_curve(Cancer~Age + Female + Smokes, #fitting a logistic model
                                data = dcaData, 
                                study.design = "cohort", 
                                policy = "opt-in",  #default 
                                bootstraps = 50)

#plot the curve
plot_decision_curve(baseline.model,  curve.names = "baseline model")

Next, we create a decision curve with two markers added to the original baseline model. We then pass a list with both decision curves to plot_decision_curve to plot both curves.

set.seed(123)
full.model <- decision_curve(Cancer~Age + Female + Smokes + Marker1 + Marker2,
              data = dcaData, 
              bootstraps = 50)

#since we want to plot more than one curve, we pass a list of 'DecisionCurve' objects to the plot
plot_decision_curve( list(baseline.model, full.model), 
                   curve.names = c("Baseline model", "Full model"), xlim = c(0, 1), legend.position = "bottomright") 

#see all available options using 
?decision_curve
?plot_decision_curve

Alternative plotting functions

We include two other functions plot_roc_components and plot_clinical_impact.

plot_roc_components plots the components of the ROC curve-true positive rate and false positive rates-over a range of high risk thresholds.

If we were to use the specified model to classify 1,000 hypothetical subjects, plot_clinical_impact plots the number classified as high risk and the number with the outcome classified as high risk for a given high risk threshold.

#plot the components of the roc curve--true positive rate and false positive rate
plot_roc_components(full.model,  xlim = c(0, 0.4), 
                    col = c("black", "red"))
#plot the clinical impact 
plot_clinical_impact(full.model, xlim = c(0, .4), 
                     col = c("black", "blue"))

The net benefit of an 'opt-out' vs. 'opt-in' policy

For many clinical contexts, a risk model is used in one of two ways. In the first setting, the standard-of-care for a population is to assign a particular 'treatment' to no one. Clinicians then use a risk model to categorize patients as 'high-risk', with the recommendation to treat high-risk patients with some intervention. We refer to this use of a risk model as an 'opt-in' policy. The relative costs and benefits of such a policy are implicitly specified in terms of the 'cost' of treating an individual without the event (a control) compared to the 'benefit' of treating an observation with the event (a case).

The net benefit for an 'opt-in' policy is most naturally defined in terms of the true and false positive rates, the outcome prevalence $\rho$, and the 'high-risk' threshold $r_H$.

$$ NB^{(opt-in)}(r_H) = \rho TPR(r_H) - ({1-\rho})\Big(\frac{r_H}{1-r_H}\Big)FPR(r_H) $$ Alternatively, there are contexts where the standard-of-care is to recommend a treatment to an entire patient population. The potential use of a risk model in this setting is to identify patients who are 'low-risk' and recommend that those patients 'opt-out' of treatment. For this formulation, the relative costs and benefits are (implicitly) specified as the 'cost' of not treating a case, and the benefit of not treating a control.

The net benefit for this type of 'opt-out' policy is most naturally defined in terms of the true and false negative rates, the outcome prevalence \rho, and the 'low-risk' threshold $r_L$:

$$ NB^{(opt-out)}(r_H) = (1-\rho) TNR(r_L) - {\rho}\Big(\frac{1-r_L}{r_L}\Big)FNR(r_L) $$

So far in this tutorial we have calculated the net benefit curves using the default policy = 'opt-in' setting. To calculate the net benefit for an 'opt-out' policy, we set policy = 'opt-out' when using the decision_curve function.

set.seed(123)
opt.out.dc <- decision_curve(Cancer~Age + Female + Smokes + Marker1 + Marker2,
              data = dcaData, 
              bootstraps = 50, 
              policy = 'opt-out') #set policy = 'opt-out' (default is 'opt-in')


plot_decision_curve( opt.out.dc,  xlim = c(0, 1),
                     curve.names = "model", legend = 'bottomright')

The clinical impact plots are also influenced by the policy argument. When we set policy = 'opt-out' the clinical impact plot displays the number low risk instead of the number high risk.

plot_clinical_impact( opt.out.dc, col = c("black", "blue"), legend = 'bottomright')

Tweaking the defaults

We show several examples of how one might change the default settings.

Fine tune the thresholds, move the legend, and change linewidth and colors. Here we are calculating many more points on the curve (see the 'thresholds' setting).

baseline.model <- decision_curve(Cancer~Age + Female + Smokes,
                                data = dcaData, 
                                 thresholds = seq(0, .4, by = .001),# calculate thresholds from 0-0.4 at every 0.001 increment. 
                                bootstraps = 25)

full.model <- decision_curve(Cancer~Age + Female + Smokes + Marker1 + Marker2,
                            data = dcaData, 
                            thresholds = seq(0, .4, by = .001),# calculate thresholds from 0-0.4 at every 0.001 increment. 
                            bootstraps = 25)


plot_decision_curve( list(baseline.model, full.model), 
                   curve.names = c("Baseline model", "Full model"),
                   col = c("blue", "red"), 
                   lty = c(1,2), 
                   lwd = c(3,2, 2, 1),  # the first two correspond to the decision curves, then 'all' and then 'none' 
                   legend.position = "bottomright") #adjust the legend position

No confidence intervals, cost:benefit ratio axis, or legend

plot_decision_curve( list(baseline.model, full.model), 
                   curve.names = c("Baseline model", "Full model"),
                   col = c("blue", "red"), 
                   confidence.intervals = FALSE,  #remove confidence intervals
                   cost.benefit.axis = FALSE, #remove cost benefit axis
                   legend.position = "none") #remove the legend

Print specific cost:benefit ratios.

plot_decision_curve( list(baseline.model, full.model), 
                   curve.names = c("Baseline model", "Full model"),
                   col = c("blue", "red"), 
                  cost.benefits = c("1:1000", "1:4", "1:9", "2:3", "1:3"),  #set specific cost benefits
                   legend.position = "bottomright")

Plot net benefit instead of standardize net benefit, change confidence interval level.

baseline.model <- decision_curve(Cancer~Age + Female + Smokes,
                                data = dcaData, 
                                thresholds = seq(0, .4, by = .01),
                                confidence.intervals = 0.9, #calculate 90% confidence intervals
                                bootstraps = 25)

full.model <- decision_curve(Cancer~Age + Female + Smokes + Marker1 + Marker2,
                            data = dcaData, 
                            thresholds = seq(0, .40, by = .01),
                            confidence.intervals = 0.9, #calculate 90% confidence intervals
                            bootstraps = 25)

plot_decision_curve( list(baseline.model, full.model), 
                   curve.names = c("Baseline model", "Full model"),
                   col = c("blue", "red"), 
                   ylim = c(-0.05, 0.15), #set ylim
                   lty = c(2,1), 
                   standardize = FALSE, #plot Net benefit instead of standardized net benefit
                   legend.position = "topright")

Providing fitted risks from a previously specified model

If a risk model has already been specified, and so no model fitting is needed, the user can specify fitted.risks=TRUE and provide them in the formula. No model fitting will be done and bootstrap confidence intervals will be conditional on the fitted model.

#helper function

expit <- function(xx) exp(xx)/ (1+exp(xx))
# Assume we have access to previously published models 
# (or models built using a training set)
# that we can use to predict the risk of cancer. 

# Basic model using demographic variables: Age, Female, Smokes. 
dcaData$BasicModel <- with(dcaData, expit(-7.3 + 0.18*Age - 0.08*Female + 0.80*Smokes ) )

# Model using demographic + markers : Age, Female, Smokes, Marker1 and Marker2. 
dcaData$FullModel <- with(dcaData, expit(-10.5 + 0.22*Age  - 0.01*Female + 0.91*Smokes + 2.03*Marker1 - 1.56*Marker2))


full.model <- decision_curve(Cancer~FullModel,
                            data = dcaData,
                            fitted.risk = TRUE, 
                            thresholds = seq(0, .4, by = .05),
                            bootstraps = 25) 


plot_decision_curve(full.model, legend.position = "none")

Printing estimates

DecisionCurve also outputs all calculations and point estimates when assigned to an object.

full.model <- decision_curve(Cancer~Age + Female + Smokes + Marker1 + Marker2,
                            data = dcaData, 
                            thresholds = seq(.01, .99, by = .05),
                            bootstraps = 25, policy = "opt-out")

The best way to look at point estimates (w/ci's) is to use summary.

summary(full.model, measure = "NB") #outputs standardized net benefit by default

you can also choose which measure to report, and how many decimal places to print. The options for the measure argument are 'TPR', 'FPR', 'sNB', and 'NB'.

#output true positive rate
summary(full.model, nround = 2, measure = "TPR")

You can also access the data directly. full.model$derived.data holds the net benefit and it's components.

head(full.model$derived.data)

Case-control data

If data is from a case-control study instead of an observational cohort, an estimate of the population level outcome prevalence is needed to calculate decision curves. Decision curves can be calculated by setting study.design = "case-control" and setting the population.prevalence. Once the decision_curve is calculated, all other calls to the plot functions remain the same. Note that bootstrap sampling is done stratified by outcome status for these data.

#simulated case-control data with same variables as above
data(dcaData_cc)

#estimated from the population where the 
#case-control sample comes from. 
population.rho = 0.11

baseline.model_cc <- decision_curve(Cancer~Age + Female + Smokes,
                                    data = dcaData_cc, 
                                    thresholds = seq(0, .4, by = .01),
                                    bootstraps = 25, 
                                    study.design = "case-control", 
                                    population.prevalence = population.rho)

full.model_cc <- decision_curve(Cancer~Age + Female + Smokes + Marker1 + Marker2,
                                data = dcaData_cc, 
                                thresholds = seq(0, .4, by = .01),
                                bootstraps = 25, 
                                study.design = "case-control", 
                                population.prevalence = population.rho)


plot_decision_curve( list(baseline.model_cc, full.model_cc), 
                   curve.names = c("Baseline model", "Full model"),
                   col = c("blue", "red"), 
                   lty = c(1,2), 
                   lwd = c(3,2, 2, 1), 
                   legend.position = "bottomright")

Cross-validation

We provide a wrapper to perform k-fold cross-validation to obtain bias corrected decision curves. Once cv_decision_curve is called, all plot and summary functions work the same as shown above for decision_curve output. Confidence interval calculation is not available at this time for cross-validated curves.

full.model_cv <- cv_decision_curve(Cancer~Age + Female + Smokes + Marker1 + Marker2,
                                    data = dcaData,
                                    folds = 5, 
                                    thresholds = seq(0, .4, by = .01), 
                                   policy = "opt-out")

full.model_apparent <- decision_curve(Cancer~Age + Female + Smokes + Marker1 + Marker2,
                                data = dcaData, 
                                thresholds = seq(0, .4, by = .01),
                                confidence.intervals = 'none', policy = "opt-out")


plot_decision_curve( list(full.model_apparent, full.model_cv), 
                   curve.names = c("Apparent curve", "Cross-validated curve"),
                   col = c("red", "blue"), 
                   lty = c(2,1), 
                   lwd = c(3,2, 2, 1), 
                   legend.position = "bottomright") 

plot_clinical_impact(full.model_cv, legend.position = "bottomright")