This practical aims to guide you through some of the key ideas in ggplot2. As with the first practical, feel free to experiment. Some of the functions introduced in this practical haven't been explicitly covered in the notes. Use the built-in R help or the ggplot2 help pages at

http://had.co.nz/ggplot2/

\noindent as needed.

Over plotting

Scatter plots are very useful. However, when we have a large data set, points will be plotted on top of each other obscuring the relationship. We call this problem over plotting. There are a few techniques we can use to help, although the best solution is often problem specific.

To begin with we will create an example data frame:

## If your computer is slow when plotting reduce the value of n
library("jrAdvGgplot2")
library("ggplot2")
df = overplot_data(n = 20000)
h = ggplot(df) + geom_point(aes(x, y))
h

\noindent We can create a simple scatter plot of this data using the following command

h = ggplot(df) + geom_point(aes(x, y))

\noindent This plot isn't particularly good. Try to improve it by using a combination of:

h +  stat_density2d(aes(x, y, fill = ..density..),
                contour = FALSE, geom = "tile")

do?
- What does stat_bin2d() and stat_binhex() do - add it to the plot to find out! Try varying the parameters bins and binwidth.

Displaying distributions

The diamonds data set contains the prices and other attributes of almost 54,000 diamonds. It is a data frame with $53,940$ rows and $10$ variables. First, load the diamonds data set:

data(diamonds, package = "ggplot2")

\noindent and look at the help file:

?diamonds

\noindent We can construct a histogram of diamond depth using the following commands:

i1 = ggplot(data = diamonds) +
  geom_histogram(aes(x = depth))
i1

\noindent to get figure 2. Let's experiment a bit.

  1. Change the binwidth in the geom_histogram. What value do you think is best?
  2. What happens when you set colour=cut in the geom_histogram aesthetic? What other options can you change?^[Look at the geom_histogram help page: http://had.co.nz/ggplot2/geom_histogram.html}
  3. Try geom_density. Set fill=cut and change the alpha value.
  4. Try geom_boxplot.

Copy cat

data(mpg, package = "ggplot2")
# mpg$drv = as.character(mpg$drv)
# mpg[mpg$drv == "f",]$drv = "Front"
# mpg[mpg$drv == "r",]$drv = "Rear"
# mpg[mpg$drv == "4",]$drv = "4wd"
# mpg$drv = factor(mpg$drv,
#    levels = c("Front", "Rear", "4wd"))

g = ggplot(data = mpg, aes(x = displ, y = hwy))
g1 = g + geom_point() + stat_smooth(linetype = 2) +
  xlab("Displacement") + ylab("Highway mpg")
g2 = g + geom_point() + stat_smooth(aes(colour = drv))
g1

The aim of this section is to recreate the graphics in figures 3 and 4. Feel free to experiment. To begin, load the package

library("ggplot2")

\noindent and the mpg data set

data(mpg, package = "ggplot2")
dim(mpg)
  1. Figure 3: Create a scatter plot of engine displacement, displ, against highway mpg, hwy. To get started:
ggplot(data = mpg, aes(x = displ, y = hwy)) +
  geom_point() + xlab("Displacement")

Now add a dashed loess line and change the $y$-axis label. Hint: try stat_smooth and ylab(New label')`.

g = ggplot(data = mpg, aes(x = displ, y = hwy))
g1 = g + geom_point() + stat_smooth(linetype = 2) +
  xlab("Displacement") + ylab("Highway mpg")
g2
  1. Figure 4: Using stat_smooth, add a loess line conditional on the drive.
g2 = g + geom_point() + stat_smooth(aes(colour = drv))

Solutions

Solutions are contained within this package:

library(jrAdvGgplot2)
vignette("solutions5", package = "jrAdvGgplot2")


jr-packages/jrAdvGgplot2 documentation built on Dec. 27, 2019, 2:23 p.m.