knitr::opts_chunk$set(echo = TRUE)
The voi
package functions evppi()
, evpi()
and evsi()
all return data frames in a "tidy" format with one row per VoI estimate. This allows plots to be produced and customised with little further effort using ggplot2
.
This vignette demonstrates how to illustrate VoI analyses graphically, using an example EVSI results dataset chemo_evsi_or
supplied with the voi
package.
We also explain the use of the enbs()
function to calculate and optimise the expected net benefit of sampling for a simple study design.
Knowledge of ggplot2 is required to be able to adapt these plots for your own analyses and communication needs, but plenty of learning resources are linked from the ggplot2 home page.
Some functions to create similar plots using base R graphics are supplied with the VoI Book.
library(voi) library(ggplot2) library(dplyr)
The example dataset chemo_evsi_or
included with the package gives the results of an EVSI analysis for the chemotherapy example model.
This object is a data frame with r nrow(chemo_evsi_or)
rows and three columns, giving the sample size per arm (n
, from 50 to 1500), willingness-to-pay (k
, from 10000 to 50000 pounds) and the corresponding EVSI (evsi
).
head(chemo_evsi_or)
Sidenote: The EVSI in this example is the expected value of a two-arm trial with a binary outcome of whether a patient experiences side-effects. The trial is only used to inform the log odds ratio of side effects between treatments, and information about the baseline risk of side effects is ignored. Source code for generating this dataset using the moment matching method in
evsi()
is provided here.
For reference, we also calculate the EVPI for the corresponding willingness-to-pay values, using output from probabilistic analysis of the decision model, as stored in the object chemo_cea_501
provided with the voi
package.
evpi_df <- evpi(outputs = chemo_cea_501)
We also extract the values of the EVPPI (for the log odds ratio) into a data frame of the same format. These EVPPI values are already stored in chemo_evsi_or
, because they had been
calculated as a by-product of the moment matching method to calculate EVSI.
evppi_df <- attr(chemo_evsi_or, "evppi")
The following ggplot
code produces curves of EVSI against willingness to pay, with different sample sizes shown by curves of different colours. The sample size is indicated by a colour gradient. The corresponding curves of EVPI, and EVPPI for the "focal" parameter of interest (the log odds ratio of side effects), are also shown for reference.
ggplot(chemo_evsi_or, aes(x=k, y=evsi, group=n, color=n)) + geom_line() + scale_colour_gradient(low="skyblue", high="darkblue") + xlab("Willingness to pay (£)") + ylab("EVSI per person") + geom_line(data=evpi_df, aes(x=k, y=evpi), color="black", lwd=1.5, inherit.aes = FALSE) + geom_line(data=evppi_df, aes(x=k, y=evppi), color="darkblue", lwd=1.5, inherit.aes = FALSE) + labs(color="Sample size") + xlim(0,52000) + annotate("text",x=50000,y=125,label="EVPI",hjust=0) + annotate("text",x=50000,y=107,label="EVPPI",color="darkblue",hjust=0) + annotate("text",x=50000,y=50,label="EVSI",color="darkblue",hjust=0)
Or we can adapt the plot to only show a limited number of sample sizes, shown in a discrete legend, and customise the colour range.
n_use <- c(250, 500, 1000) chemo_evsi_or %>% filter(n %in% n_use) %>% ggplot(aes(x=k, y=evsi, group=n, color=n)) + geom_line(lwd=1.5) + scale_colour_binned(breaks=n_use, low = "gray80", high = "gray20", guide=guide_legend(title="Sample size", reverse=TRUE)) + xlab("Willingness to pay (£)") + ylab("EVSI (per person)") + geom_line(data=evppi_df, aes(x=k, y=evppi), color="darkblue", lwd=1.5, lty=3, inherit.aes = FALSE) + annotate("text", x=50000, y=130, col="darkblue", label="EVPPI", hjust=1)
In a similar way, we can produce curves of EVSI against sample size, with different willingness-to-pay (WTP) values shown by curves of different colours. To reduce visual clutter, only a limited number of willingness-to-pay values are illustrated. This plot shows how the EVSI converges to the (WTP-dependent) EVPPI for the focal parameter (log odds ratio of side effects) as the sample size increases.
chemo_evsi_or %>% filter(k %in% seq(10000,50000, by=1000)) %>% ggplot(aes(x=n, y=evsi, group=k, color=k)) + geom_line() + scale_colour_gradient(low="skyblue", high="darkblue") + xlab("Sample size") + ylab("EVSI per person") + labs(color="Willingness-to-pay (£)")
The voi
package includes a function enbs()
to obtain the expected net benefit of sampling for a proposed study of individual participants, given estimates of EVSI. See the help page enbs()
for full documentation for this function.
The
enbs()
function supposes that the costs of the study include a fixed setup cost (costs_setup
), and a cost per participant recruited (costs_pp
). (Note that ifn
describes the sample size per arm in a two-arm study,costs_pp
is actually the cost of recruiting two people, one in each arm.) To acknowledge uncertainty, a range of costs may be supplied as two numbers. These are assumed to describe a 95\% credible interval for the cost, with the point estimate given by the midpoint between these two numbers.Additional required information includes the size of the population affected by the decision (
pop
), and a discount rate (disc
, by default 0.035) to be applied over a time horizontime
. These may be supplied as vectors, in which case ENBS is calculated for all potential combinations of values (see below for an example).
Here is an example of calling enbs()
. The result is a data frame
giving the sample size (n
), willingness-to-pay (k
), and corresponding
ENBS (enbs
). The standard deviation (sd
) describes the uncertainty
about the ENBS estimate due to the uncertain costs. pce
is the probability
that the study is cost-effective (i.e. has a positive net benefit of sampling).
The other components of this data frame are related to the optimal sample size (as discussed below)
nbs <- enbs(chemo_evsi_or, costs_setup = c(5000000, 10000000), costs_pp = c(28000, 42000), pop = 46000, time = 10) nbs %>% filter(k==20000) %>% head()
The expected net benefit of sampling for a given willingness-to-pay
can then be illustrated by filtering the nbs
dataframe to this
willingness-to-pay value, then passing it to a simple ggplot
.
nbs %>% filter(k==20000) %>% ggplot(aes(x=n, y=enbs)) + geom_line() + xlab("Sample size") + ylab("Expected net benefit of sampling") + scale_y_continuous(labels = scales::dollar_format(prefix="£"))
Note the
scales::dollar_format
trick for showing large currency values tidily in the axis labels.
The same colouring techniques used above (using a group
aesthetic, with scale_colour_gradient
) might also be used to illustrate the
dependence of the ENBS curve on the willingness-to-pay.
Uncertainty around the ENBS can be illustrated by plotting quantiles of
the ENBS alongside the point estimates. These quantiles can be derived
from the point estimate (enbs
) and the standard deviation (sd
) stored in the data frame that we called nbs
, assuming a normal distribution.
nbs %>% filter(k==20000) %>% mutate(q975 = qnorm(0.975, enbs, sd), q75 = qnorm(0.75, enbs, sd), q25 = qnorm(0.25, enbs, sd), q025 = qnorm(0.025, enbs, sd)) %>% ggplot(aes(y=enbs, x=n)) + geom_ribbon(aes(ymin=q025, ymax=q975), fill="gray80") + geom_ribbon(aes(ymin=q25, ymax=q75), fill="gray50") + geom_line() + xlab("Sample size") + ylab("Expected net benefit of sampling") + scale_y_continuous(labels = scales::dollar_format(prefix="£")) + annotate("text",x=1100,y=85000000,label="95% credible interval") + annotate("text",x=1250,y=70000000,label="50% credible interval")
The enbs()
function also estimates the optimal sample size for each willingness-to-pay value. That is, the sample size which gives the
maximum expected net benefit of sampling. The optimal sample size can be
estimated using two alternative methods.
A simple and fast method: just pick the highest ENBS from among the
sample sizes included in the evsi
object that was supplied to enbs()
.
A more sophisticated (but slower) method uses nonparametric regression to smooth and interpolate the ENBS estimates. This is useful if the EVSI has only been computed for a limited number of sample sizes, or if the EVSI estimates are noisy (which may happen for the regression-based method). We might judge that the ENBS is a smooth function of sample size, that we could estimate by regression on the estimates that have been computed.
Here is a demonstration of how the regression smoothing method works.
Suppose we have computed EVSI (hence ENBS) for a limited set of sample sizes, say eight, from 100 to 800 by 100, and for a specific willingness to pay (and population size, horizon and discount rate).
The function enbs_opt
is used to create from this a smoothed and interpolated dataset (here called preds
) that contains 701 ENBS estimates, for every integer from 100 to 800. The maximum ENBS can then be found by searching through this interpolated dataset.
nbs_subset <- nbs %>% filter(k==20000, n %in% seq(100,800,by=100)) eopt <- enbs_opt(nbs_subset, smooth=TRUE, keep_preds=TRUE)
The enbs_opt
function returns the maximum ENBS (enbsmax
), the optimal
sample size (nmax
) and an interval estimate (nlower
to nupper
) defined
by the lowest and highest sample size that have ENBS within 5% of the optimum.
eopt
The full interpolated dataset is also returned as the "preds"
attribute.
preds <- attr(eopt, "preds") head(preds, 2)
In this graph, the eight pre-computed ENBS values are shown as points, and the 701 smoothed/interpolated values are shown on a line.
ggplot(nbs_subset, aes(x=n, y=enbs)) + geom_point() + geom_line(data=preds) + xlab("Sample size") + ylab("Expected net benefit of sampling") + scale_y_continuous(labels = scales::dollar_format(prefix="£")) + geom_vline(data=eopt, aes(xintercept=nlower), col="blue", lty=2) + geom_vline(data=eopt, aes(xintercept=nmax), col="blue", lty=2) + geom_vline(data=eopt, aes(xintercept=nupper), col="blue", lty=2) + geom_hline(data=eopt, aes(yintercept=enbsmax), col="blue", lty=2)
The maximum, and the range of sample sizes whose ENBS is within 5% of this maximum, are shown as blue dotted lines.
Technical note: this uses the default spline regression model
s()
from thegam
function in themgcv
package. The flexibility of the fitted model can be configured by setting thesmooth_df
argument to eitherenbs
orenbs_opt
. This is passed as thek
argument to thes()
function, and defaults to 6 (or the number of sample sizes minus 1, if this is lower than 6).
enbs_opt
assumes a single willingness-to pay (and population size, etc).
In practice you do not need to use enbs_opt
, unless you want to check the goodness-of-fit of the regression that is fitted. Instead you can just call enbs(..., smooth=TRUE)
, and this will determine the optimal sample size, using smoothing, for each of multiple willingness-to-pay values (see the nbs_smoothed
object below for an example).
In this example, there are a wide range of sample sizes within 5% of the optimum. In practice, the cheapest study, the one at the bottom of this range, might still be deemed acceptable to decision-makers. There is still a lot of unacknowledged uncertainty in this analysis, including about the costs, incident population size, decision horizon, and computational uncertainty in the per-person EVSI estimate. The location of the "exact" maximum of the curve of ENBS versus sample size might be considered a minor issue in the context of these other uncertainties.
For convenience, the enbs()
function also returns, as the "enbsmax"
attribute, a data frame with one row per willingness-to-pay (k
), giving the optimal ENBS (enbsmax
) the optimal sample size (nmax
) and an interval estimate for the optimal sample size (nlower
to nupper
). An extract is shown here.
attr(nbs, "enbsmax") %>% filter(k %in% seq(20000,50000,by=10000))
This allows a plot to be drawn of the optimal sample size against willingness to pay, with an interval estimate defined by this range of "95% optimal" sample sizes. This is compared here for the two methods of determing the optimum: with and without smoothing.
p1 <- attr(nbs, "enbsmax") %>% ggplot(aes(y=nmax, x=k)) + geom_ribbon(aes(ymin=nlower, ymax=nupper), fill="gray") + geom_line() + ylim(0,800) + xlab("Willingness to pay (£)") + ylab("Optimal sample size") + ggtitle("Unsmoothed") nbs_smoothed <- enbs(chemo_evsi_or, costs_setup = c(5000000, 10000000), costs_pp = c(28000, 42000), pop = 46000, time = 10, smooth=TRUE) p2 <- attr(nbs_smoothed, "enbsmax") %>% ggplot(aes(y=nmax, x=k)) + geom_ribbon(aes(ymin=nlower, ymax=nupper), fill="gray") + geom_line() + ylim(0,800) + xlab("Willingness to pay (£)") + ylab("Optimal sample size") + ggtitle("Smoothed") gridExtra::grid.arrange(p1, p2, nrow=1)
Given the uncertainty about the incident population size and decision time horizon, we might want to show how the results depend on these. This might be done using a heat map.
The next plot selects a specific willingness to pay (£20,000) and plots the probability that a study with a sample size of 400 is cost effective, i.e. the probability that the expected net benefit of sampling is positive. This is compared on a grid defined by a range of population sizes from 0 to 60000, and a range of decision horizon times from 0 to 10.
nbs <- enbs(chemo_evsi_or %>% filter(k==20000, n==400), costs_setup = c(5000000, 10000000), costs_pp = c(28000, 42000), pop = seq(0,60000,length.out=50), time = seq(0,10,length.out=50)) ggplot(nbs, aes(y=pop, x=time, fill=pce)) + geom_raster() + labs(fill="Probability") + scale_fill_gradient(low="darkblue", high="white") + xlab("Decision horizon (years)") + ylab("Incident population") + theme_minimal()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.