knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
In this vignette we will recreate Figure 2B from Matsushita & Nishimura (2020)[^source] which was made in GraphPad Prism 7.
The authors of this paper mutate trehalose synthesis enzyme (Tps1) in fruit flies and then observe the effect of this mutation on their wings. Figure 2B, shown below, shows the percentage change in three fly wing attributes in homozygous Tps1 mutants compared to heterozygous Tps1 mutants. The data are separated according to the sex of the fly.
{ width=70% }
The raw data underlying Figure 2B is accessible in the article's Supplementary
Data. For ease of use, this small data set (aptly named wings
) has been
'tidied' and is provided with the ggprism
package.
To recreate this figure we begin with loading the required packages and data set.
library(dplyr) library(ggplot2) library(ggprism) library(ggbeeswarm) library(rstatix) data("wings")
head(wings)
The data are already in 'long format' so we could just jump straight to
plotting. However, we can see that the text in the measure
column is not
in title case so we'll need to quickly do some string manipulation.
# substitute period with space, convert to title case, convert back to factor wings$measure <- wings$measure %>% gsub("\\.", " ", .) %>% tools::toTitleCase() %>% factor(., levels = c("Wing Size", "Cell Size", "Cell Number"))
head(wings)
Now we'll just give the ggplot()
function our data.
# plot the wing measurement vs the percentage change compared to # the heterozygous mutants p <- ggplot(wings, aes(x = measure, y = percent.change))
First we will plot the individual data points with random jitter using
geom_beeswarm()
from the ggbeeswarm
package. The points will look too spaced out here, but don't worry they will
become more compact in the following step. We will also colour the
points and adjust their horizontal position them by genotype.
# add the data points and fill according to the genotype p <- p + ggbeeswarm::geom_beeswarm( aes(fill = genotype), dodge.width = 0.9, shape = 21, cex = 3.5 ) p
Now we want to separate the points according to sex as in the original plot.
For this we can just use facet_wrap()
and then set the facet titles to the
Unicode symbols for male and female.
# separate data according to fly sex p <- p + facet_wrap( ~ sex, scales = "free", labeller = labeller(sex = c(male = "\u2642", female = "\u2640")) ) p
Overlying the points in the original plot is a horizontal dotted line at
y = 0. This is added to our ggplot using geom_hline()
.
# plot a horiztonal line at y = 0 i.e. percentage change = 0 p <- p + geom_hline(yintercept = 0, linetype = 2, linewidth = 0.3) p
On top of this are red horizontal bars showing the mean value for each group.
These can be added to ggplots using the stat_summary()
function with the
geom_crossbar()
Geom.
# plot solid red lines at the mean of each group p <- p + stat_summary( geom = "crossbar", aes(fill = genotype), fun = mean, position = position_dodge(0.9), colour = "red", linewidth = 0.4, width = 0.7, show.legend = FALSE ) p
The last data-related element to be added are the p-value brackets. To do this
we need to generate a table of p-values with the correct format, and then
use this table with the add_pvalue()
function from ggprism
.
The easiest way to generate the p-value table is with the
rstatix
(you could also do it
manually but using rstatix
ensures the table is of the correct format).
In this case we group the wings
data by sex and wing measure, and then test
whether the percentage
change in wing size/cell size/cell number is explained by the fly genotype.
We also use the add_x_position()
function from rstatix
to make sure the
bracket ends have the correct x axis positions (specifically the xmin
and
xmax
columns). Lastly, we manually change the
bracket labels to appear as we would like them to.
# perform t-tests to compare the means of the genotypes for each measure # and correct the p-values for multiple testing wings_pvals <- wings %>% group_by(sex, measure) %>% rstatix::t_test( percent.change ~ genotype, p.adjust.method = "BH", var.equal = TRUE, ref.group = "Tps1MIC/+" ) %>% rstatix::add_x_position(x = "measure", dodge = 0.9) %>% # dodge must match points mutate(label = c("***", "*", "P = 0.26", "***", "***", "P = 0.65")) wings_pvals
With this p-value table, we can now add p-value brackets.
# plot the p-values with brackets for each comparison p <- p + add_pvalue( wings_pvals, y = 10, xmin = "xmin", xmax = "xmax", tip.length = 0, fontface = "italic", lineend = "round", bracket.size = 0.5 ) p
With the data-related elements in place we can turn to changing the overall
plot theme and appearance. First, we'll apply theme_prism()
. By default
the theme uses all bold font so we set base_fontface = "plain"
so that
all text is of normal weight.
# apply theme_prism() p <- p + theme_prism( base_fontface = "plain", base_line_size = 0.7, base_family = "Arial" ) p
We'll need to fix the x axes text as the labels are overlapping, and the axes
use brackets instead of a line. We can fix the labels using line breaks as in
the original plot, and also set the axes guides to guide_prism_bracket()
.
# change the x axis to brackets and add line breaks to the axis text p <- p + scale_x_discrete( guide = guide_prism_bracket(width = 0.15), labels = scales::wrap_format(5) ) p
Now we'll fix the y axes so that they are identical as in the original Figure 2B.
We'll also apply the guide_prism_offset()
guide so the axis line doesn't extend
beyond the outermost tick marks. Additionally, we'll adjust the y axis title.
# adjust the y axis limits and change the style to 'offset' p <- p + scale_y_continuous( limits = c(-20, 12), expand = c(0, 0), breaks = seq(-20, 10, 5), guide = "prism_offset" ) + labs(y = "% change") p
The plot is already looking good as is. But the goal of this vignette is to recreate the original as closely as possible, so we will continue with some minor adjustments.
We need to use scale_fill_manual()
to change the point fills to match. We'll
also fix the formatting of the legend text in this step.
# change the point fill and adjust the legend text formatting p <- p + scale_fill_manual( values = c("#026FEE", "#87FFFF"), labels = c(expression("Tps"*1^italic("MIC")~"/ +"), expression("Tps"*1^italic("MIC"))) ) p
Next we'll do a few things at once in one call of the theme()
function:
p <- p + theme( legend.position = "bottom", axis.title.x = element_blank(), strip.text = element_text(size = 14), legend.spacing.x = unit(0, "pt"), legend.text = element_text(margin = margin(r = 20)) ) p
The original graph also has a label with the number of replicates in the bottom right corner of the plotting area. This is a little tricky to do with facets and requires creating a dummy data.frame to get the positioning correct.
# add n = 10 as a text annotation p <- p + geom_text( data = data.frame( sex = factor("female", levels = c("male", "female")), measure = "Cell Number", percent.change = -18.5, lab = "(n = 10)" ), aes(label = lab) ) p
The final thing to do is just make the legend symbols a bit larger using
the overrride.aes
argument in guide_legend()
.
# make the points larger in the legend only p <- p + guides(fill = guide_legend(override.aes = list(size = 3))) p
And that's it! Our ggplot graph is almost identical to the original.
{ width=70% }
[^source]: Matsushita, R., Nishimura, T. Trehalose metabolism confers developmental robustness and stability in Drosophila by regulating glucose homeostasis. Commun Biol 3, 170 (2020). https://doi.org/10.1038/s42003-020-0889-1
Here is the complete code for the plot.
wings_pvals <- wings %>% group_by(sex, measure) %>% rstatix::t_test( percent.change ~ genotype, p.adjust.method = "BH", var.equal = TRUE, ref.group = "Tps1MIC/+" ) %>% rstatix::add_x_position(x = "measure", dodge = 0.9) %>% # dodge must match points mutate(label = c("***", "*", "P = 0.26", "***", "***", "P = 0.65")) ggplot(wings, aes(x = measure, y = percent.change)) + ggbeeswarm::geom_beeswarm(aes(fill = genotype), dodge.width = 0.9, shape = 21) + facet_wrap(~ sex, scales = "free", labeller = labeller(sex = c(male = "\u2642", female = "\u2640"))) + geom_hline(yintercept = 0, linetype = 2, linewidth = 0.3) + stat_summary(geom = "crossbar", aes(fill = genotype),fun = mean, position = position_dodge(0.9), colour = "red", linewidth = 0.4, width = 0.7, show.legend = FALSE) + add_pvalue(wings_pvals, y = 10, xmin = "xmin", xmax = "xmax", tip.length = 0, fontface = "italic", lineend = "round", bracket.size = 0.5) + theme_prism(base_fontface = "plain", base_line_size = 0.7, base_family = "Arial") + scale_x_discrete(guide = guide_prism_bracket(width = 0.15), labels = scales::wrap_format(5)) + scale_y_continuous(limits = c(-20, 12), expand = c(0, 0), breaks = seq(-20, 10, 5), guide = "prism_offset") + labs(y = "% change") + scale_fill_manual(values = c("#026FEE", "#87FFFF"), labels = c(expression("Tps"*1^italic("MIC")~"/ +"), expression("Tps"*1^italic("MIC")))) + theme(legend.position = "bottom", axis.title.x = element_blank(), strip.text = element_text(size = 14), legend.spacing.x = unit(0, "pt"), legend.text = element_text(margin = margin(r = 20))) + geom_text(data = data.frame(sex = factor("female", levels = c("male", "female")), measure = "Cell Number", percent.change = -18.5, lab = "(n = 10)"), aes(label = lab)) + guides(fill = guide_legend(override.aes = list(size = 3)))
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.