This practical aims to guide you through some of the key ideas in ggplot2. As with the first practical, feel free to experiment. Some of the functions introduced in this practical haven't been explicitly covered in the notes. Use the built-in R help or the ggplot2 help pages at
http://had.co.nz/ggplot2/
\noindent as needed.
Scatter plots are very useful. However, when we have a large data set, points will be plotted on top of each other obscuring the relationship. We call this problem over plotting. There are a few techniques we can use to help, although the best solution is often problem specific.
To begin with we will create an example data frame:
## If your computer is slow when plotting reduce the value of n library("jrGgplot2Bio") library("ggplot2") df = overplot_data(n = 20000) h = ggplot(df) + geom_point(aes(x, y))
h
\noindent We can create a simple scatter plot of this data using the following command
h = ggplot(df) + geom_point(aes(x, y))
\noindent This plot isn't particularly good. Try to improve it by using a combination of:
alpha
;^[alpha
takes a value between $0$ and $1$.]shape=1
and shape=
.'`geom_jitter
.stat_density2d
.h + stat_density2d(aes(x, y, fill = ..density..), contour = FALSE, geom = "tile")
do?
- What does stat_bin2d()
and stat_binhex()
do - add it to the
plot to find out! Try varying the parameters bins
and binwidth
.
Lets return to the outbreaks data to look at how to visualise distributions.
data(outbreaks, package = "jrGgplot2Bio")
\noindent First lets calculate a new variable, log transformed illnesses. This is because the distribution of illnesses is very skewed, recall some of the plots from the first practical.
outbreaks$log_illness = log(outbreaks$illnesses)
\noindent We can construct a histogram of log transformed illness counts using the following commands:
ggplot(data = outbreaks) + geom_histogram(aes(x = log_illness))
ggplot(data = outbreaks) + geom_histogram(aes(x = log_illness))
\noindent to get figure 2. Let's experiment a bit.
bins
in the geom_histogram
. What value do you
think is best?fill=species
in the geom_histogram
aesthetic? What other options can you change?^[Look at the geom_histogram
help page: http://had.co.nz/ggplot2/geom_histogram.html}geom_density
. Set fill=species
and change the alpha
value.library(dplyr) data(yeast, package = "jrGgplot2Bio") yeast = yeast %>% filter(class == "CYT" | class == "EXC")
The aim of this section is to recreate the graphics in figure 3. Feel free to experiment. To begin, load the package and subset the data.
library(dplyr) data(yeast, package = "jrGgplot2Bio") yeast = yeast %>% filter(class == "CYT" | class == "EXC")
ggplot(yeast, aes(x = gvh, y = mcg)) + geom_point(aes(col = class)) + geom_density2d(aes(col = class), alpha = 0.5) + xlab("von Heijne's method") + ylab("McGeoch's method")
mcg
, against von Heijne's method, gvh
. To get started:ggplot(yeast, aes(x = gvh, y = mcg)) + geom_point() + xlab("von Heijne's method")
Now add the densisity lines and change the $y$-axis label.
Hint: try geom_density2d
and ylab(
New label')`.
ggplot(yeast, aes(x = gvh, y = mcg)) + geom_point() + geom_density2d() + xlab("von Heijne's method") + ylab("McGeoch's method")
col
aesthetic, add colour to the points and density lines dependent upon the class
variable. For the final touch adjust the transparency of the density lines.ggplot(yeast, aes(x = gvh, y = mcg)) + geom_point(aes(col = class)) + geom_density2d(aes(col = class), alpha = 0.5) + xlab("von Heijne's method") + ylab("McGeoch's method")
Solutions are contained within this package:
vignette("solutions2", package = "jrGgplot2Bio")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.