library(knitr) # opts_knit$set(out.format = "latex") knit_theme$set(knit_theme$get("greyscale0")) options(replace.assign = FALSE, width = 50) opts_chunk$set(fig.path = "figure/graphics-", cache.path = "cache/graphics-", fig.align = "center", fig.width = 4, fig.height = 4, fig.show = "hold", cache = FALSE, par = TRUE) knit_hooks$set(crop = hook_pdfcrop)
In this question, we are going to use a for
statement to
loop over a data set and construct some plots. The data we are going to use comes from a paper on malarial transmission traits and how they vary with temperature[^1]. This data was published to Data Dryad[^2]. To obtain the data, run the following piece of R code.
[^1]: Shapiro LLM, Whitehead SA, Thomas MB (2017) Quantifying the effects of temperature on mosquito and parasite traits that determine the transmission potential of human malaria. PLOS Biology 15(10): e2003489. https://doi.org/10.1371/journal.pbio.2003489
[^2]: Shapiro LLM, Whitehead SA, Thomas MB (2017) Data from: Quantifying the effects of temperature on mosquito and parasite traits that determine the transmission potential of human malaria. Dryad Digital Repository. https://doi.org/10.5061/dryad.74839
data(malaria, package = "jrProgBio") head(malaria)
The data frame malaria
represents an experiment, where we have 6 temprerature treatments and measurements of sporozoite prevalence at over 25 days. Additionally the experiment is split into 2 blocks with each treatment in each block having 4 cups. A delightful design. We want to create a
scatter and line plot of sporozoite prevalence over time, for each treatment combination. If you decide to read the original paper, we are recreating Figure 2 from the paper but with our own code.
library("dplyr") library("ggplot2") treat_a = filter(malaria, Temperature == 21) ggplot(treat_a, aes(x = Day, y = Sporozoite_Prevalence)) + geom_point()
for (temp in unique(malaria$Temperature)) { group = filter(malaria, Temperature == temp) g = ggplot(group, aes(x = Day, y = Sporozoite_Prevalence)) + geom_point() print(g) readline("Hit return for next plot") }
unique(malaria$Temperature)
give?## It gives all temperatures
for
loop, what variable is changing? What are it's possible values?## The Temperature variable is changing. ## It goes through the different temps.
readline()
function do?## It halts execution, waits for user input
xlab()
:ggplot(group, aes(x = Day, y = Sporozoite_Prevalence)) + geom_point() + xlab("Day")
ggplot(group, aes(x = Day, y = Sporozoite_Prevalence)) + geom_point() + xlab("Day") + ylab("Sporozoite Prevalence")
Use the ylab()
function to alter the $y$-axis label.
ggtitle()
function, viz:ggplot(group, aes(x = Day, y = Sporozoite_Prevalence)) + geom_point() + xlab("Day") + ylab("Sporozoite Prevalence") + ggtitle("Temperature")
We can combine strings/characters using the paste()
function,
paste("Temperature", temp)
Rather than have a static title, make the title of each plot display the temperature.
ggplot(group, aes(x = Day, y = Sporozoite_Prevalence)) + geom_point() + xlab("Day") + ylab("Sporozoite Prevalence") + ggtitle(paste("Temperature :", temp))
ylim()
and xlim()
to fix the range. Hint: Work out the range before the for
loop using the range()
function.ylims = range(malaria$Sporozoite_Prevalence) xlims = range(malaria$Day) ggplot(group, aes(x = Day, y = Sporozoite_Prevalence)) + geom_point() + xlab("Day") + ylab("Sporozoite Prevalence") + ggtitle(paste("Temperature :", temp)) + xlim(xlims[1], xlims[2]) + ylim(ylims[1], ylims[2])
ggplot(group, aes(x = Day, y = Sporozoite_Prevalence, colour = factor(Block))) + geom_point() + xlab("Day") + ylab("Sporozoite Prevalence") + ggtitle(paste("Temperature :", temp)) + xlim(xlims[1], xlims[2]) + ylim(ylims[1], ylims[2])
geom_line()
function in the same way you used the geom_point()
function.Hint: For the geom_line()
have a look at the group
aesthetic and the interaction()
function.
ggplot(group, aes(x = Day, y = Sporozoite_Prevalence, colour = factor(Block))) + geom_point() + geom_line(aes(group = interaction(Block, Cup))) + xlab("Day") + ylab("Sporozoite Prevalence") + ggtitle(paste("Temperature :", temp)) + xlim(xlims[1], xlims[2]) + ylim(ylims[1], ylims[2])
You should end up with plots that looks something like this for each temperature:
temp = 34 group = filter(malaria, Temperature == temp) ylims = range(malaria$Sporozoite_Prevalence) xlims = range(malaria$Day) ggplot(group, aes(x = Day, y = Sporozoite_Prevalence, colour = factor(Block))) + geom_point() + geom_line(aes(group = interaction(Block, Cup))) + xlab("Day") + ylab("Sporozoite Prevalence") + ggtitle(paste("Temperature :", temp)) + xlim(xlims[1], xlims[2]) + ylim(ylims[1], ylims[2])
Can you think of further ways to use programming to make this code neater?
pdf()
function to your code save the resulting graph:# decide on a filename and path filename = "my_awesome_figure.pdf" pdf(filename) # do your plotting # close the connection to the file dev.off()
for
loop and plotting commands, in a function which takes the data frame as an argument.Solutions are contained within this package:
vignette("solutions5", package = "jrProgBio")
## FULL SOLUTION viewgraphs = function(malaria, save=FALSE, filename = "graphs.pdf") { if (save) { pdf(filename) } ylims = range(malaria$Sporozoite_Prevalence) xlims = range(malaria$Day) for (temp in unique(malaria$Temperature)) { group = filter(malaria, Temperature == temp) g = ggplot(group, aes(x = Day, y = Sporozoite_Prevalence, colour = factor(Block))) + geom_point() + geom_line(aes(group = interaction(Block, Cup))) + xlab("Day") + ylab("Sporozoite Prevalence") + ggtitle(paste("Temperature :", temp)) + xlim(xlims[1], xlims[2]) + ylim(ylims[1], ylims[2]) print(g) } if (save) { dev.off() } }
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.