Home

/

GitHub

/

In jr-packages/jrGgplot2: Jumping Rivers: Advanced Graphics in R

This practical aims to guide you through some of the key ideas in ggplot2. As with the first practical, feel free to experiment. Some of the functions introduced in this practical haven't been explicitly covered in the notes. Use the built-in R help or the ggplot2 help pages at

http://had.co.nz/ggplot2/

\noindent as needed.

Over plotting

Scatter plots are very useful. However, when we have a large data set, points will be plotted on top of each other obscuring the relationship. We call this problem over plotting. There are a few techniques we can use to help, although the best solution is often problem specific.

To begin with we will create an example data frame:

## If your computer is slow when plotting reduce the value of n
library("jrGgplot2")
library("ggplot2")
df = overplot_data(n = 20000)
h = ggplot(df) + geom_point(aes(x, y))

\noindent We can create a simple scatter plot of this data using the following command

library("ggplot2")
h = ggplot(df) + geom_point(aes(x, y))

\noindent This plot isn't particularly good. Try to improve it by using a combination of:

changing the transparency level: alpha;^[alpha takes a value between $0$ and $1$.]
change the shape: shape=1 and shape=.'`
use some jittering - geom_jitter.
adding a contour to the plot using stat_density2d.
What does

h +  stat_density2d(aes(x, y, fill = ..density..),
                contour = FALSE, geom = "tile")

do?
- What does stat_bin2d() and stat_binhex() do - add it to the plot to find out! Try varying the parameters bins and binwidth.

Now, let's see what happens when we load in the xydata data set and experiment with the size and alpha aesthetics. The source code to load the data and produce the ggplot object are given below - prepare to be amazed!

xydata = get_xy_data()

The default number of random scatter points is 20000. We can change this by altering the argument n within the get_xy_data() function.

ggplot(data = xydata, mapping = aes(x = x, y = y)) +
  geom_point()

Try replacing geom_point() with geom_hex(bins = 150)

Displaying distributions

The diamonds data set contains the prices and other attributes of almost 54,000 diamonds. It is a data frame with $53,940$ rows and $10$ variables. First, load the diamonds data set:

data(diamonds, package = "ggplot2")

\noindent and look at the help file:

?diamonds

\noindent We can construct a histogram of diamond depth using the following commands:

i1 = ggplot(data = diamonds) +
  geom_histogram(aes(x = depth))

i1

\noindent to get figure 2. Let's experiment a bit.

Change the binwidth in the geom_histogram. What value do you think is best?
What happens when you set colour=cut in the geom_histogram aesthetic? What other options can you change?^[Look at the geom_histogram help page: http://had.co.nz/ggplot2/geom_histogram.html}
Replace geom_histogram with geom_density. Set fill=cut and change the alpha value.
Try geom_boxplot.

Copy cat

data(mpg, package = "ggplot2")
g = ggplot(data = mpg, aes(x = displ, y = hwy))
g1 = g + geom_point() + stat_smooth(linetype = 2) +
  labs(x = "Displacement", y = "Highway mpg")
g2 = g + geom_point() + stat_smooth(aes(colour = drv))

g1

The aim of this section is to recreate the graphics in figures 3 and 4. Feel free to experiment. To begin, load the package

library("ggplot2")

\noindent and the mpg data set

data(mpg, package = "ggplot2")
dim(mpg)

Figure 3: Create a scatter plot of engine displacement, displ, against highway mpg, hwy. To get started:

ggplot(data = mpg, aes(x = displ, y = hwy)) +
  geom_point() +
  labs(x = "Displacement")

Now add a dashed loess line and change the $y$-axis label. Hint: try stat_smooth and adding a $y$ label to the previous labs function.

g = ggplot(data = mpg, aes(x = displ, y = hwy))
g1 = g + geom_point() + stat_smooth(linetype = 2) +
  labs(x = "Displacement",
       y = "Highway mpg")

g2

Figure 4: Using stat_smooth, add a loess line conditional on the drive.

g2 = g + geom_point() + stat_smooth(aes(colour = drv))

Solutions

Solutions are contained within this package:

library(jrGgplot2)
vignette("solutions2", package = "jrGgplot2")

jr-packages/jrGgplot2 documentation built on Sept. 20, 2020, 2:59 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

jr-packages/jrGgplot2
Jumping Rivers: Advanced Graphics in R

In jr-packages/jrGgplot2: Jumping Rivers: Advanced Graphics in R

Over plotting

Displaying distributions

Copy cat

Solutions

R Package Documentation

Browse R Packages

We want your feedback!

jr-packages/jrGgplot2 Jumping Rivers: Advanced Graphics in R

In jr-packages/jrGgplot2: Jumping Rivers: Advanced Graphics in R

Over plotting

Displaying distributions

Copy cat

Solutions

R Package Documentation

Browse R Packages

We want your feedback!

jr-packages/jrGgplot2
Jumping Rivers: Advanced Graphics in R