This practical aims to guide you through some of the key ideas in data manipulation. I've tried to construct this practical in such a way that you get to experiment with the various tools. Feel free to experiment!
library("ggplot2") data(aphids, package = "jrGgplot2Bio")
aphids$Block = factor(aphids$Block) aphids$Water = factor(aphids$Water, levels = c("Low", "Medium", "High")) ga = ggplot(data = aphids) + geom_point(aes(Time, Aphids, colour = Block)) + facet_grid(Nitrogen ~ Water) + geom_line(aes(Time, Aphids, colour = Block)) + theme_bw() print(ga)
This data set consists of seven observations on cotton aphid counts on twenty randomly chosen leaves in each plot, for twenty-seven treatment-block combinations. The data were recorded in July 2004 in Lamesa, Texas. The treatments consisted of three nitrogen levels (blanket, variable and none), three irrigation levels (low, medium and high) and three blocks, each being a distinct area. Irrigation treatments were randomly assigned within each block as whole plots. Nitrogen treatments were randomly assigned within each whole block as split plots.
data(aphids, package = "jrGgplot2Bio")
\noindent The sampling times are once per week.
\newthought{Reproduce} figure 1. Here are some hints to get you started. The key idea is to think of the plot in terms of layers. So
geom_line
and geom_point
.+ xlab("Time")
theme_bw()
\newpage
##Code for figure 1 aphids$Block = factor(aphids$Block) aphids$Water = factor(aphids$Water, levels = c("Low", "Medium", "High")) ga = ggplot(data = aphids) + geom_point(aes(Time, Aphids, colour = Block)) + facet_grid(Nitrogen ~ Water) + geom_line(aes(Time, Aphids, colour = Block)) + theme_bw()
First load the yeast data set
data(yeast, package = "jrGgplot2Bio")
\noindent In practical 2, we split data up by the subcellular localisation (class
) but we only looked at a few of the localisations. Lets start by plotting the signal sequence recognition measures for all of the subcellular localisations. Use the following commands:
g = ggplot(data = yeast) g1 = g + geom_point(aes(x = gvh, y = mcg, col = class))
g1
\noindent to get figure 2. In addition using the fill
aesthetic, redo the plot but use
facet_grid
and facet_wrap
. For example,
g + geom_point(aes(x = gvh, y = mcg, col = class)) + facet_grid(~ class)
\noindent Experiment with:
margins
argumentg + geom_point(aes(x = gvh, y = mcg, col = class)) + facet_grid(~ class, margins = TRUE)
scales=
argument.g + geom_point(aes(x = gvh, y = mcg, col = class)) + facet_grid(~ class, scales = "free")
g + geom_point(aes(x = gvh, y = mcg, col = class)) + facet_grid(class ~ .)
data("outbreaks", package = "jrGgplot2Bio") g = ggplot(outbreaks, aes(x = year, y = illnesses)) + geom_point() + geom_smooth() + scale_y_log10() + facet_grid(state ~ species)
In the outbreaks data we have 3 species of pathogen, 3 states in the US and a timeline of infection incidences for each combination. We now have the tools to visualise the whole lot in one go.
data("outbreaks", package = "jrGgplot2Bio")
g
illnesses
and year
.scale_y_log10()
transform the y
scale.geom_smooth()
.state ~ species
to separate the data across those two categories.g = ggplot(outbreaks, aes(x = year, y = illnesses)) + geom_point() + geom_smooth() + scale_y_log10() + facet_grid(state ~ species)
The yeast dataset has a lot of variables in it, all measuring different aspects of the amino acid sequences. We have considered a few of these and found interesting patterns that relate to the subcellular locations in the cell. What about the other variables though? What have we left behind? Wouldn't it be nice if we had a way of visualising more of the possible relationships at once?
The ggplot2 package has lots and lots of extensions, one such extension that we will use to make a matrix of pairwise plots is GGally
(http://ggobi.github.io/ggally/)[http://ggobi.github.io/ggally/].
Lets load the package and choose some variables to plot:
library(dplyr) library(GGally) data(yeast, package = "jrGgplot2Bio") yeast = select(yeast, -seq, -erl, -pox)
Now use the ggpairs function to make a grid of pairwise plots. Look at the documentation and experiment with some of the different arguments. For example try adding aes(colour = class)
.
ggpairs(yeast)
Solutions are contained within this package:
vignette("solutions3", package = "jrGgplot2Bio")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.