This practical aims to guide you through some of the key ideas in ggplot2. As with the first practical, feel free to experiment. Some of the functions introduced in this practical haven't been explicitly covered in the notes. Use the built-in R help or the ggplot2 help pages at
http://had.co.nz/ggplot2/
\noindent as needed.
Scatter plots are very useful. However, when we have a large data set, points will be plotted on top of each other obscuring the relationship. We call this problem over plotting. There are a few techniques we can use to help, although the best solution is often problem specific.
To begin with we will create an example data frame:
## If your computer is slow when plotting reduce the value of n library("jrAdvGgplot2") library("ggplot2") df = overplot_data(n = 20000) h = ggplot(df) + geom_point(aes(x, y))
h
\noindent We can create a simple scatter plot of this data using the following command
h = ggplot(df) + geom_point(aes(x, y))
\noindent This plot isn't particularly good. Try to improve it by using a combination of:
alpha
;^[alpha
takes a value between $0$ and $1$.]shape=1
and shape=
.'`geom_jitter
.stat_density2d
.h + stat_density2d(aes(x, y, fill = ..density..), contour = FALSE, geom = "tile")
do?
- What does stat_bin2d()
and stat_binhex()
do - add it to the
plot to find out! Try varying the parameters bins
and binwidth
.
The diamonds data set contains the prices and other attributes of almost 54,000 diamonds. It is a data frame with $53,940$ rows and $10$ variables. First, load the diamonds data set:
data(diamonds, package = "ggplot2")
\noindent and look at the help file:
?diamonds
\noindent We can construct a histogram of diamond depth using the following commands:
i1 = ggplot(data = diamonds) + geom_histogram(aes(x = depth))
i1
\noindent to get figure 2. Let's experiment a bit.
binwidth
in the geom_histogram
. What value do you
think is best?colour=cut
in the geom_histogram
aesthetic? What other options can you change?^[Look at the geom_histogram
help page: http://had.co.nz/ggplot2/geom_histogram.html}geom_density
. Set fill=cut
and change the alpha
value.geom_boxplot
.data(mpg, package = "ggplot2") # mpg$drv = as.character(mpg$drv) # mpg[mpg$drv == "f",]$drv = "Front" # mpg[mpg$drv == "r",]$drv = "Rear" # mpg[mpg$drv == "4",]$drv = "4wd" # mpg$drv = factor(mpg$drv, # levels = c("Front", "Rear", "4wd")) g = ggplot(data = mpg, aes(x = displ, y = hwy)) g1 = g + geom_point() + stat_smooth(linetype = 2) + xlab("Displacement") + ylab("Highway mpg") g2 = g + geom_point() + stat_smooth(aes(colour = drv))
g1
The aim of this section is to recreate the graphics in figures 3 and 4. Feel free to experiment. To begin, load the package
library("ggplot2")
\noindent and the mpg
data set
data(mpg, package = "ggplot2") dim(mpg)
displ
, against highway mpg, hwy
. To get started:ggplot(data = mpg, aes(x = displ, y = hwy)) + geom_point() + xlab("Displacement")
Now add a dashed loess line and change the $y$-axis label.
Hint: try stat_smooth
and ylab(
New label')`.
g = ggplot(data = mpg, aes(x = displ, y = hwy)) g1 = g + geom_point() + stat_smooth(linetype = 2) + xlab("Displacement") + ylab("Highway mpg")
g2
stat_smooth
, add a loess line conditional
on the drive.g2 = g + geom_point() + stat_smooth(aes(colour = drv))
Solutions are contained within this package:
library(jrAdvGgplot2) vignette("solutions5", package = "jrAdvGgplot2")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.