This practical aims to guide you through some of the key ideas in ggplot2. As with the first practical, feel free to experiment. Some of the functions introduced in this practical haven't been explicitly covered in the notes. Use the built-in R help or the ggplot2 help pages at
http://had.co.nz/ggplot2/
\noindent as needed.
Scatter plots are very useful. However, when we have a large data set, points will be plotted on top of each other obscuring the relationship. We call this problem over plotting. There are a few techniques we can use to help, although the best solution is often problem specific.
To begin with we will create an example data frame:
## If your computer is slow when plotting reduce the value of n library("jrGgplot2") library("ggplot2") df = overplot_data(n = 20000) h = ggplot(df) + geom_point(aes(x, y))
h
\noindent We can create a simple scatter plot of this data using the following command
library("ggplot2") h = ggplot(df) + geom_point(aes(x, y))
\noindent This plot isn't particularly good. Try to improve it by using a combination of:
alpha
;^[alpha
takes a value between $0$ and $1$.]shape=1
and shape=
.'`geom_jitter
.stat_density2d
.h + stat_density2d(aes(x, y, fill = ..density..), contour = FALSE, geom = "tile")
do?
- What does stat_bin2d()
and stat_binhex()
do - add it to the
plot to find out! Try varying the parameters bins
and binwidth
.
Now, let's see what happens when we load in the xydata
data set and experiment with the size and alpha aesthetics. The source code to load the data and produce the ggplot object are given below - prepare to be amazed!
xydata = get_xy_data()
The default number of random scatter points is 20000. We can change this by altering the argument n within the get_xy_data()
function.
ggplot(data = xydata, mapping = aes(x = x, y = y)) + geom_point()
Try replacing geom_point()
with geom_hex(bins = 150)
The diamonds data set contains the prices and other attributes of almost 54,000 diamonds. It is a data frame with $53,940$ rows and $10$ variables. First, load the diamonds data set:
data(diamonds, package = "ggplot2")
\noindent and look at the help file:
?diamonds
\noindent We can construct a histogram of diamond depth using the following commands:
i1 = ggplot(data = diamonds) + geom_histogram(aes(x = depth))
i1
\noindent to get figure 2. Let's experiment a bit.
binwidth
in the geom_histogram
. What value do you
think is best?colour=cut
in the geom_histogram
aesthetic? What other options can you change?^[Look at the geom_histogram
help page: http://had.co.nz/ggplot2/geom_histogram.html}geom_histogram
with geom_density
. Set fill=cut
and change the alpha
value.geom_boxplot
.data(mpg, package = "ggplot2") g = ggplot(data = mpg, aes(x = displ, y = hwy)) g1 = g + geom_point() + stat_smooth(linetype = 2) + labs(x = "Displacement", y = "Highway mpg") g2 = g + geom_point() + stat_smooth(aes(colour = drv))
g1
The aim of this section is to recreate the graphics in figures 3 and 4. Feel free to experiment. To begin, load the package
library("ggplot2")
\noindent and the mpg
data set
data(mpg, package = "ggplot2") dim(mpg)
displ
, against highway mpg, hwy
. To get started:ggplot(data = mpg, aes(x = displ, y = hwy)) + geom_point() + labs(x = "Displacement")
Now add a dashed loess line and change the $y$-axis label.
Hint: try stat_smooth
and adding a $y$ label to the previous labs function.
g = ggplot(data = mpg, aes(x = displ, y = hwy)) g1 = g + geom_point() + stat_smooth(linetype = 2) + labs(x = "Displacement", y = "Highway mpg")
g2
stat_smooth
, add a loess line conditional
on the drive.g2 = g + geom_point() + stat_smooth(aes(colour = drv))
Solutions are contained within this package:
library(jrGgplot2) vignette("solutions2", package = "jrGgplot2")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.