source("R/utils.R")
r format_date(params$datetime)
r params$instructor
r params$level
Package to install:
r
install.packages("ggplot2")
At the end of this session, you will achieve this objective by creating a fairly simple, visually-appealing graph that shows:
aes()
, such as what to put
on the x-axis, the y-axis, and or using colour
or size
.geom_
, such as points, lines, or
boxplots.labs()
.theme()
, so that
it is publication ready.For learning:
For help:
?
(such as ?geom_point
or ?theme
)For this session we will be using the CO2
dataset. Here is some code to get a
sense of the data.
# Variables names(CO2) # General contents str(CO2) # Quick summary summary(CO2)
There are several exercises in this session. Choose one of the below datasets and use that dataset for all later exercises.
For complete R beginners, use:
mpg
For more confident R users, use one of these:
economics
diamonds
msleep
txhousing
Check out the contents of the dataset you choose using:
# variable names of dataset names(___) # contents of dataset str(___) # summary of dataset summary(___)
ggplot2 uses the "Grammar of Graphics" (gg). This is a powerful approach to creating plots because it provides a consistent way of telling ggplot2 what to do. There are at least three aspects to using ggplot2 that relate to the grammar:
aes()
: How data should be mapped to the plot. Includes what to
put on x axis, on the y axis, colours, size, etc.geom_
: The visual representation of the data, as a layer. This
tells ggplot2 how to show the aesthetics. Includes points, lines, boxes, etc.theme_
or theme()
: How the plot should look like. Includes the
text, axis lines, etc.To maximise the power of ggplot2, make heavy use of autocompletion. You can do
this by typing, for instance, geom_
and then hitting the TAB key to see a list
of all the geoms. Or after typing theme(
, hit TAB to see all the options inside
theme.
There are many ways of showing plotting continuous (e.g. weight, height) variables in ggplot2. For discrete (e.g. terrain type: mountain, plains, or sex: woman, man) variables, there is really only one way.
library(ggplot2) # Continuous ggplot(CO2, aes(x = conc)) + geom_density() # Discrete ggplot(CO2, aes(x = Treatment)) + geom_bar()
Time: 10 min
# put name of dataset below names(___) # use dataset with one continuous variable ggplot(___, aes(x = ___)) + # finish the geom to create either a histogram, freqpoly, or density layer ___ # use dataset with one discrete variable ggplot(___, aes(x = ___)) + # finish the geom to create a bar layer ___
You can of course include data on the y axis too! This is usually what you use graphs for! There are many more types of "geoms" to use for having data on both axes, and which one you choose depends on what you are trying to show and what the data is like. Usually you put the variable that you can influence (the independent variable) on the x axis and the variable that responds (the dependent variable) on the y axis.
# Using continuous data co2_plot_nums <- ggplot(CO2, aes(x = conc, y = uptake)) # Standard scatter plot co2_plot_nums + geom_point() # Connect all the data with a line co2_plot_nums + geom_line() # Put overlapping data into "hexes".. useful for massive datasets co2_plot_nums + geom_hex() # Connects data as they appear in the dataset co2_plot_nums + geom_path() # Runs a smoothing line with confidence interval co2_plot_nums + geom_smooth() # Using mixed data co2_plot_mixed <- ggplot(CO2, aes(x = Type, y = uptake)) # Standard boxplot co2_plot_mixed + geom_boxplot() # Bar plot, showing total sum of uptake co2_plot_mixed + geom_col() # Better than boxplot, show the actual data! co2_plot_mixed + geom_jitter() # Give more distance between groups co2_plot_mixed + geom_jitter(width = 0.2)
Time: 8 min
# use dataset with two continuous variables ggplot(___, aes(x = ___, y = ___)) + # finish the geom to create either a point, line, hex, smooth, or abline layer ___ # use dataset with one continuous and one discrete variable ggplot(___, aes(x = ___, y = ___)) + # finish the geom to create either a boxplot, jitter, or col layer ___
You can also add an additional dimension to the data by using other elements (colours, size, transparency, etc) of the graph to represent another variable. This is NOT the same thing as using 3-dimensionl (aka x, y, z axis) plots, which should be avoided unless absolutely necessary! Using colours to represent discrete groups is useful, or for using shading to represent a range in continuous values.
co2_plot_colour <- ggplot(CO2, aes(x = conc, y = uptake, colour = Treatment)) # Scatter plot co2_plot_colour + geom_point() # Line plot co2_plot_colour + geom_line() # Smoothing co2_plot_colour + geom_smooth()
Or add a fourth variable.
# Scatter plot co2_plot_colour + geom_point(aes(shape = Type)) # Line plot co2_plot_colour + geom_line(aes(linetype = Type)) # Another line plot co2_plot_colour + geom_path(aes(linetype = Type)) # Smoothing plot co2_plot_colour + geom_smooth(aes(linetype = Type))
And it's easy to add another geoms!
# Three layers co2_plot_colour + geom_point(aes(shape = Type)) + geom_line(aes(linetype = Plant)) + geom_smooth(aes(size = Type))
Time: 8 min
# use dataset with either: # - two continuous variables and one discrete # - three continuous variables # for last argument, choose either size, colour, alpha ggplot(___, aes(x = ___, y = ___, ___ = ___)) + # finish the geom to create either a point or line layer ___
Let's get to making the plot prettier. There are many many options to customise
the plot using the theme()
.
co2_plot_prettying <- ggplot(CO2, aes( x = conc, y = uptake, colour = paste(Treatment, Type) ) ) + geom_point() + geom_smooth() # Some pre-defined themes co2_plot_prettying + theme_bw() co2_plot_prettying + theme_minimal()
pretty_plot <- co2_plot_prettying + theme_classic() + scale_color_brewer(name = "Treatment and origin", palette = "Dark2") + # Find this information in ?CO2 labs(x = "CO2 concentration (mL/L)", y = "CO2 update rate (umol/m2)") + theme( # all axis lines, must use element_line axis.line = element_line(colour = "grey50", size = 0.5), # all axis text, must use element_text axis.text = element_text(family = "sans"), # all axis tick marks, use element_blank to remove axis.ticks = element_blank() ) pretty_plot
Time: 10 min
# use dataset with two continuous variables ggplot(___, aes(x = ___, y = ___)) + # finish the geom to create either a point, smooth, or line layer ___ + # choose either a minimal, dark, light, or classic defined theme ___ + theme( # choose colours such as red, blue, black, grey, yellow, green # choose size from 2 to 8 panel.grid.major = element_line(colour = ___, size = ___), # choose family such as sans, serif, Arial, Times New Romans axis.text = element_text(colour = ___, size = ___, familyl = ___) )
Now, if you want to save the plot, you can do that pretty easily!
ggsave("plant_co2_uptake.pdf", pretty_plot, width = 7, height = 5)
Time: Until end of session
aes()
, one for:x
-axisy
-axissize
, colour
, alpha
, stroke
, or fill
geom_
layers. The geom you use will depend on the variables and
the specific aes()
you choose above.labs()
.theme_
) and make two changes to it using theme()
.ggsave()
.Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.