If you attended the Introduction to R course with us, you will already be familiar with some of the basic ggplot2 concepts. This practical serves as a reminder on some of those concepts, whilst also introducing some new ones. If you didn't attend this course, use this practical as a introduction to the basic concepts of ggplot2. Some these of plots aren't particularly useful, we are just using them for illustration purposes.
\newthought{To begin with}, load the ggplot2 package^[The ggplot2 package is automatically installed with jrGgplot2Bio.]
library("ggplot2")
\noindent Next we load the outbreaks data set:^[Details of the outbreaks data set can be found at in its helpfile, see ?outbreaks
] These are incidence numbers for three pathogens across three states in the USA.
data(outbreaks, package = "jrGgplot2Bio")
\noindent When loading in data, it's always a good idea to carry out a sanity check. I tend to use the commands
head(outbreaks) colnames(outbreaks) dim(outbreaks)
Scatter plots are created using the point geom. Let's start with a basic scatter plot
ggplot(data = outbreaks) + geom_point(aes(x = illnesses, y = hospitalizations))
\noindent To save typing, we can also store the plot as a variable:^[In this practical, we are creating the plots in a slightly verbose way.]
g = ggplot(data = outbreaks) g1 = g + geom_point(aes(x = illnesses, y = hospitalizations))
\noindent To view this plot, type g1
.
The arguments x
and y
are called aesthetics. For geom_point
,
these parameters are required. This particular geom has other aesthetics:
shape
, colour
, size
and alpha
.^[These aesthetics are
usually available for most geoms.] Here are some things to try out.
g + geom_point(aes(x = illnesses, y = hospitalizations, colour = species))
\noindent or
g + geom_point(aes(x = illnesses, y = hospitalizations, colour = species, alpha = status))
Some aesthetics, like shape
must be discrete. We might have to transform the variable into a
character or factor - for example shape = factor(year)
.
Are there any differences between numeric values like tenured
and
characters like gender
for some aesthetics? What happens if you use
a numeric such as fatalities
in the colour
aesthetic. For example, colour = fatalities
.
What happens if you set colour
(or some other aesthetic) outside of
the aes
function? For example, compare
g + geom_point(aes(x = illnesses, y = hospitalizations, colour = "Blue"))
\noindent to
g + geom_point(aes(x = illnesses, y = hospitalizations), colour = "Blue")
colour = 2
. What happens if you put this argument outside of the aes
function?The box plot geom has the following aesthetics: x
, y
, colour
,
fill
, linetype
, weight
, size
and alpha
. We can create a
basic boxplot using the following commands:
g + geom_boxplot(aes(x = species, y = illnesses))
\noindent Similar to the point geom, we can add in aesthetics:
g + geom_boxplot(aes(x = species, y = illnesses, colour = factor(year)))
\noindent Why do you think we have to convert year
to a discrete factor?
As before, experiment with the different aesthetics. For some of the aesthetics, you will need to convert the continuous variables to discrete variables. For example, this will not work as intended:
g + geom_boxplot(aes(x = species, y = illnesses, colour = year))
\noindent while this is OK^[Why?]
g + geom_boxplot(aes(x = species, y = illnesses, colour = factor(year)))
\noindent Make sure you play about with the different aesthetics.
The key idea with ggplot2 is to think in terms of layers not in terms of plot "types".^[In the lectures we will discuss what this means.] For example,
g + geom_boxplot(aes(x = species, y = illnesses, colour = factor(year))) + geom_point(aes(x = species, y = illnesses))
geom_boxplot
and geom_point
function calls?geom_point
isn't that great. Try using
geom_jitter
^[We have a bit too much data for
geom_jitter
, but you get the point.]:g + geom_jitter(aes(x = species, y = illnesses)) + geom_boxplot(aes(x = species, y = illnesses, colour = factor(year)))
Again this is probably not how you would want to visualise this data but hopefully you get the idea of combining different layers to make a composite plot.
Lets looks at bar plots using the yeast dataset[^1]. These are multiple different properties of the amino acid sequences of yeast proteins alongside the subcellular localisation of the same proteins.
data(yeast, package = "jrGgplot2Bio")
[^1]: Paul Horton & Kenta Nakai, "A Probablistic Classification System for Predicting the Cellular Localization Sites of Proteins", Intelligent Systems in Molecular Biology, 109-115. St. Louis, USA 1996.
The bar geom has the following aesthetics: x
, colour
, fill
,
size
, linetype
, weight
and alpha
. Here is a command to get
started:
g = ggplot(yeast) g + geom_bar(aes(x = class))
Note that the geom_bar
needed only one aesthetic, what happened? What is on the y-axis? As before, if you want to experiment feel free, try different aesthetic combinations. Now let's get a bit more fancy.
Lets re-order the bars according to the number of loci represented. We do this by altering the levels of the factor itself.
First lets calculate the counts ourselves so that we have a data object that we can better manipulate.
library(dplyr) class_counts = yeast %>% count(class)
\noindent Now we can manipulate the factor levels of our new dataset.
class_counts$class = reorder(class_counts$class, class_counts$n)
\noindent then plot:
g = ggplot(class_counts) g + geom_bar(aes(x = class, y = n), stat = "identity")
\noindent Notice that we now needed the stat = "identity"
argument to override the behaviour of geom_bar
which takes a row count by default.
\noindent We can flip the axes using the coord_flip()
function.
g + geom_bar(aes(x = class, y = n), stat = "identity") + coord_flip()
g + geom_bar(aes(x = class, y = n), stat = "identity") + coord_flip()
Solutions are contained within this package:
vignette("solutions1", package = "jrGgplot2Bio")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.