knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(braidReports) set.seed(20240828)
The ggplot
package is one of the most powerful and widely used tool-kits available
for visualizing data in R, and it's easy to understand why. Its well-crafted
grammar of layers, aesthetics, scales, and facets allows for extraordinary
versatility and power but with remarkable intuitiveness and delightfully tidy
code. It is personally my nearly universal go-to for generating visual expressions
of data; but when plotting drug combination data, it has some understandable
friction points. Consider the following example:
concentrations <- c(0,2^(-3:3)) surface <- data.frame( concA = rep(rep(concentrations,each=length(concentrations)),each=3), concB = rep(rep(concentrations,times=length(concentrations)),each=3), replicate = rep(c(1,2,3),times=(length(concentrations)^2)) ) surface$actual <- evalBraidModel( surface$concA, surface$concB, c(1, 1, 3, 3, 2, 0, 100, 100, 100) ) surface$measure <- surface$actual + rnorm(nrow(surface),sd=7) head(surface, 12)
This synthetic dataset reflects a a fairly typical combination study layout: both drugs have been tested at a range of serially diluted concentrations, and combined doses have been laid out in a a checkerboard so that every combination of concentration of the first drug (including 0) and the second drug (including 0) has been tested. Furthermore, each combined dose is tested in triplicate. The measured response surface has been simulated using a BRAID response surface model with a synergistic $\kappa$ value of 2, but any smoothly varying, noisily measured function of two doses would work just as well.
How might we visualize such a response surface using ggplot2
? A 3-D plot is
right out: ggplot2
has no standard support for such plots, nor should it, as
3-D plots are notoriously poor at conveying accurate quantitative information.
A more straightforward approach would be to plot our measured effect as a
function of one concentration, conditioned on the level of the other. For
example:
ggplot(surface,aes(x=concA,y=measure,colour=factor(concB)))+ geom_point()+ stat_summary(geom="line",fun.data=mean_se)+ scale_x_log10()+ labs(x="Drug A",y="Effect",colour="Drug B")
This view is quite effective, and gives a clear sense of how the presence of differing amounts of drug B impact the behavior of drug B. But this plotting approach necessarily differentiates between the two drugs in how they are visualized, and is ineffective at showing the shape of their interaction.
The most intuitive way to present combination data is with the use of a
response surface heatmap: plot the two doses as the x- and y- dimensions, and
visualize their effect using a suitable color scale. Unfortunately, our dataset
includes multiple measurements for each combination; were we to plot these
using something like geom_tile()
each replicate would be plotted over the
others, so that only the last replicates were visible. ggplot2
does contain
one way to address this out-of-the-box, [stat_summary_2d()], but it doesn't
behave exactly as we'd like:
ggplot(surface, aes(x=concA,y=concB))+ stat_summary_2d(aes(z=measure), fun="mean")+ scale_x_log10()+ scale_y_log10()+ scale_fill_distiller(palette="RdYlBu")+ coord_equal()+ labs(x="Drug A",y="Drug B",fill="Effect")
Not only are our evenly spaced concentrations pairs reduced to awkwardly rounded
mini-tiles, all measurements where either drug is zero have been removed from
the plot altogether. This is an intrinsic property of nearly all ggplot2
stats and geoms: if a transformed coordinate is infinite, it is removed from the
plot. Yet plotting serially diluted concentrations on a logarithmic scale is
incredibly intuitive, making comparison of non-zero concentrations and zero
concentrations frustratingly difficult. It is for these reasons that we
developed geom_braid()
:
ggplot(surface,aes(x=concA,y=concB))+ geom_braid(aes(fill=measure))+ scale_x_log10()+ scale_y_log10()+ scale_fill_distiller(palette="RdYlBu")+ coord_equal()+ labs(x="Drug A",y="Drug B",fill="Effect")
Like many ggplot2
extensions, geom_braid()
is in reality a Stat
,
performing some useful preprocessing of the data before passing it off to the
true geom, geom_tile()
. Duplicate concentration pairs are identified, and
averaged to give the aggregate value at each unique pair. Widths between doses
pairs are guessed or provided. Coordinates which, after being transformed, would
be infinite are instead offset from the main body of measurements, ensuring
they will be plotted along with the data. The result is a simple and intuitive
tool for quickly examining combined action data.
As a ggplot2
extension, geom_braid()
leverages all of the customization
and flexibility ggplot
objects traditionally afford, including integration
with other layers, ggplot
fill scales, and faceting:
ggplot(surface,aes(x=concA,y=concB))+ geom_braid(aes(fill=measure))+ geom_point(colour="black")+ scale_x_log10("Drug A")+ scale_y_log10("Drug B")+ scale_fill_viridis_c("Effect",option="A")+ coord_equal()+ facet_wrap(vars(replicate))
Widths and heights of tiles can (and generally should) be generated automatically, but can be passed to the stat as additional aesthetics if desired. Note that widths and heights should be expressed in the transformed coordinate space:
surface$tilewidth <- log10(2)*0.9 surface$tilewidth[surface$concA==0] <- log10(2)/2 surface$tileheight <- log10(2)*0.9 surface$tileheight[surface$concB==0] <- log10(2)/2 ggplot(surface,aes(x=concA,y=concB))+ geom_braid(aes(fill=measure,width=tilewidth,height=tileheight),space=3)+ scale_x_log10("Drug A")+ scale_y_log10("Drug B")+ scale_fill_distiller("Effect",palette="RdYlBu")+ coord_equal()
geom_braid()
is an effective tool for rendering classic checkerboard layouts,
but there is no guarantee that data will be laid out so cleanly. Experiments
might only measure a subset of a traditional checkerboard, or might select
points on an even denser arrangement. For example, the following adjustment
simulates an experiment in which measurements from replicates 2 and 3 have
increased the concentrations of drug A and drug B respectively. This removes
the traditional "triplicate" approach in favor of a more varied, full-coverage
method:
glassSurface <- surface glassSurface$concA[glassSurface$replicate==2] <- glassSurface$concA[glassSurface$replicate==2]*1.25 glassSurface$concB[glassSurface$replicate==3] <- glassSurface$concB[glassSurface$replicate==3]*1.25 glassSurface$actual <- evalBraidModel( glassSurface$concA, glassSurface$concB, c(1, 1, 3, 3, -0.5, 0, 60, 100, 100) ) glassSurface$measure <- glassSurface$actual+rnorm(nrow(glassSurface),sd=7) head(glassSurface, 12)
Due to its irregular spacing, plotting glassSurface
with geom_braid()
produces a very unsatisfactory plot:
ggplot(glassSurface,aes(x=concA,y=concB))+ geom_braid(aes(fill=measure))+ geom_point(colour="black")+ scale_x_log10("Drug A")+ scale_y_log10("Drug B")+ scale_fill_distiller("Effect",palette="RdYlBu")+ coord_equal()
While this plot is technically correct, it fails to fill the space in the way
a response surface should. When dealing with such irregular sampling, a better
tool is geom_braid_glass()
which produces what we call a "stained glass"
plot:
ggplot(glassSurface,aes(x=concA,y=concB))+ geom_braid_glass(aes(fill=measure))+ geom_point(colour="black")+ scale_x_log10("Drug A")+ scale_y_log10("Drug B")+ scale_fill_distiller("Effect",palette="RdYlBu")+ coord_equal()
In a stained glass plot, every point within the bounds of the plotted values is colored according to the value of the measured dose pair nearest to it, producing a mosaic of Voronoi cells that cover the full space. Values in the margins of the plot are given a height or width according the specified or inferred aesthetic, but the boundaries between them are again bisecting Voronoi boundaries.
The ability to customize the width of the resulting tiles, particularly in the margins, can be even more valuable in these plots, where the width and height default to the smallest spacing between distinct values:
ggplot(glassSurface,aes(x=concA,y=concB))+ geom_braid_glass(aes(fill=measure,width=tilewidth,height=tileheight),space=2)+ scale_x_log10("Drug A")+ scale_y_log10("Drug B")+ scale_fill_distiller("Effect",palette="RdYlBu")+ coord_equal()
While the discrete heatmaps of geom_braid
and geom_braid_glass
are the most
direct ways to visualize combined action data, the harsh polygonal edges
introduced can sometimes mask the more important variations in shape and
structure. braidReports
also includes a ggplot
geom for rendering smoothed
surfaces, unsurprisingly named geom_braid_smooth
:
ggplot(surface,aes(x=concA,y=concB))+ geom_braid_smooth(aes(fill=measure))+ scale_x_log10()+ scale_y_log10()+ scale_fill_distiller(palette="RdYlBu")+ coord_equal()+ labs(x="Drug A",y="Drug B",fill="Effect")
geom_braid_smooth
interpolates a regular grid of values from the original
(potentially irregular) data using a two-dimensional Gaussian kernel, producing
a smoothed surface that generally hews extremely close to measured values at
their respective points, but produces intuitive, smoothly varying values in
between them. It can be run on both regularly laid-out checkerboard data and
more irregularly spaced measurements, though in the latter case specifying the
smoothing width and height explicitly using the width
and height
aesthetics
is often advisable:
ggplot(glassSurface,aes(x=concA,y=concB))+ geom_braid_smooth(aes(fill=measure))+ geom_point(colour="black")+ scale_x_log10("Drug A")+ scale_y_log10("Drug B")+ scale_fill_distiller("Effect",palette="RdYlBu")+ coord_equal() ggplot(glassSurface,aes(x=concA,y=concB))+ geom_braid_smooth(aes(fill=measure,width=log10(2),height=log10(2)))+ scale_x_log10("Drug A")+ scale_y_log10("Drug B")+ scale_fill_distiller("Effect",palette="RdYlBu")+ coord_equal() ggplot(glassSurface,aes(x=concA,y=concB))+ geom_braid_smooth(aes(fill=measure,width=tilewidth,height=tileheight),space=2)+ scale_x_log10("Drug A")+ scale_y_log10("Drug B")+ scale_fill_distiller("Effect",palette="RdYlBu")+ coord_equal()
Heatmaps are an intuitive and effective way of depicting the results and shape
of a combined response surface, but they still fall short when it comes to
quantitative depiction. Even the best-designed colormap still carries
considerable imprecision, making it difficult to perceive the values and ranges
at which particular numerical values are reached. One effective tool for this
task is the contour map, which markes the boundaries of dose space at which a
given effect level is crossed. To support such plots, we have included
geom_braid_contour()
which uses the same smoothing techniques as
geom_braid_smooth()
to produces an array of x-, y-, and z-values for the
built in ggplot
stat, stat_contour()
. This allows us to visualize both
the overall shape of the surface and the boundaries of particular effect
spaces:
ggplot(surface,aes(x=concA,y=concB))+ geom_braid_smooth(aes(fill=measure))+ geom_braid_contour(aes(z=measure),breaks=10*(1:9),colour="black",linetype=2)+ scale_x_log10()+ scale_y_log10()+ scale_fill_distiller(palette="RdYlBu")+ coord_equal()+ labs(x="Drug A",y="Drug B",fill="Effect")
Note that geom_braid_contour()
uses the smoothed and interpolated values like
those produced by geom_braid_smooth()
rather than the discrete values plotted
by geom_braid()
or geom_braid_glass()
. There are two reasons for this:
first, the underlying Stat
, stat_contour()
requires that the data plotted
be laid out as a regular grid for it to perform its own interpolation.
Second, contours linearly interpolated between more parsely sampled data are
often disjointed and jagged, and carry much less information than more smoothly
interpolated values.
As with geom_braid_smooth()
, geom_braid_contour()
can be applied to both
regular and irregular data, but with irregular data more care must be taken
with the smoothing widths and heights:
ggplot(glassSurface,aes(x=concA,y=concB))+ geom_braid_smooth(aes(fill=measure))+ geom_point(colour="black")+ geom_braid_contour(aes(z=measure),breaks=10*(1:9),colour="black",linetype=2)+ scale_x_log10("Drug A")+ scale_y_log10("Drug B")+ scale_fill_distiller("Effect",palette="RdYlBu")+ coord_equal() ggplot(glassSurface,aes(x=concA,y=concB))+ geom_braid_smooth(aes(fill=measure,width=tilewidth,height=tileheight),space=2)+ geom_braid_contour(aes(z=measure,width=tilewidth,height=tileheight),space=2, breaks=10*(1:9),colour="black",linetype=2)+ scale_x_log10("Drug A")+ scale_y_log10("Drug B")+ scale_fill_distiller("Effect",palette="RdYlBu")+ coord_equal()
It should also be noted that while we have plotted contours and smoothed surfaces together here, this is only to highlight their connection to the underlying interpolated data. BRAID contours can be plotted all on their own which can be quite effective at comparing the results of different surfaces:
surface$type <- "Synergy" glassSurface$type <- "Antagonism" allSurface <- rbind(surface,glassSurface) allSurface$type <- factor(allSurface$type,c("Synergy","Antagonism")) ggplot(allSurface,aes(x=concA,y=concB,colour=type))+ geom_point()+ geom_braid_contour(aes(z=measure,width=tilewidth,height=tileheight), breaks=c(50,90), tight=TRUE)+ scale_x_log10()+ scale_y_log10()+ scale_color_brewer("Surface Type",palette="Set1")+ coord_equal()+ labs(x="Drug A",y="Drug B")
Astute ggplot2
observers will note that at several points throughout this
vignette, we have run standard ggplot2
geoms such as geom_point()
on
logarithmically transformed data that result in infinite (transformed) values.
Ordinarily, this would produce a warning: and in reality, it produces a warning
here as well. We have suppressed warnings for this vignette, because while
the braid
, braid_glass
, and braid_smooth
geoms all explicitly handle such
infinite transformed values, we have been unable to find a way to suppress
these built in warnings. The warnings result, not when a stat is running, but
before its functions are handled, and as such are, for the time being,
unavoidable. When running a BRAID plotting function yourself, you will
encounter these warnings as well:
# With warnings enabled... ggplot(surface,aes(x=concA,y=concB))+ geom_braid(aes(fill=measure))+ scale_x_log10()+ scale_y_log10()+ scale_fill_distiller(palette="RdYlBu")+ coord_equal()+ labs(x="Drug A",y="Drug B",fill="Effect")
While we feel that it is an unfortunate reality, we hope that this is a small enough inconvenience that the overall versatility and expressiveness of the BRAID plotting function outweighs it.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.