$\$
# makes sure you have all the packages we have used in class installed #SDS230::update_installed_packages() # install.packages("gapminder")
# install.packages("latex2exp") library(latex2exp) library(dplyr) #options(scipen=999) knitr::opts_chunk$set(echo = TRUE) # hide all plot output - useful for printing the code # knitr::opts_chunk$set(fig.show='hide') set.seed(123)
$\$
$\$
We can use the ggplot2 package, which is part of the tidyverse, to create much nicer looking graphics than using base R graphics. The ggplot2 library is modeled on Leland Wilkinson's "grammar of graphics" which creatse graphics from a combination of basic visual elements.
In the exercises below, we will learn how to use ggplot using the motor trends cars data set (mtcars) that comes with base R installation, and also the gapminder data.
A few resources to learn more about ggplot are:
$\$
Let's create plots of the number of miles per gallon (mpg) cars get as a function of the weight of the car.
# install.packages("ggplot2") library(ggplot2) # base R plot(mtcars$wt, mtcars$mpg) # ggplot - global mapping ggplot(data = mtcars, mapping = aes(x = wt, y = mpg)) + geom_point() # ggplot - shorter global mapping ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point() # ggplot - mapping in the geom ggplot(data = mtcars) + geom_point(mapping = aes(x = wt, y = mpg))
$\$
We can add labels to the plots using the labs()
function. Arguments to labs()
function include:
- x
: the label on the x-axis (you can also use the xlab()
function)
- y
: the label on the y-axis (you can also use the ylab()
function)
- title
: the title of the plot (you can also use the ggtitle()
function)
- subtitle
: the title of the plot (you can also use the ggtitle()
function)
If you just want to add x and y labels or a title, you can also use the xlab("label1")
, ylab("label2")
and/or the ggtitle()
functions.
ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point() + labs(x = "Weight", y = "Miles per Gallon", title = "MPG as a function of weight", subtitle = "ggplot is cool!") # another way to add just x and y labels ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point() + xlab("Weight") + ylab("Miles per Gallon") + ggtitle("MPG as a function of weight")
Remember, if you don't want exes label your axes
$\$
We can add annotations to plots using the annotate("text", x = , y = , label = )
function.
ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point() + annotate("text", x = 4, y = 30, label = "AWESOME!")
$\$
We can use other aesthetic mappings beyond position including:
- color
: different color scales are used for quantitative and categorical (factor) data
- shape
: should be categorical data
- size
: should be quantitative data
# add color based on the transmission type (is automatic or not) ggplot(mtcars, aes(x = wt, y = mpg, color = am)) + geom_point() # it is better to treat the transmission type as a categorical variable? ggplot(mtcars, aes(x = wt, y = mpg, col = factor(am))) + geom_point() # can also try mapping transmission type to shape or size ggplot(mtcars, aes(x = wt, y = mpg, shape = factor(am))) + geom_point() ggplot(mtcars, aes(x = wt, y = mpg, size = am)) + geom_point()
Question: When adding the variable automatic/manual transmission (am) to the scatter plot mpg vs. weight, do you think it is best to map am on to...?
a. color b. shape c. size
$\$
Setting an aesthetic mapping maps a variable to a glyph property. This
is done inside the aes()
function.
Setting an attribute set a glyph property to a fixed value. This is done
outside the aes()
function.
# setting an aesthetic mapping ggplot(mtcars) + geom_point(aes(x = wt, y = mpg, col = factor(am))) # setting an attribute ggplot(mtcars) + geom_point(aes(x = wt, y = mpg), col = "red")
$\$
Beyond comparing variables based on aesthetic mappings, you can compare
categorical variables by splitting a plot into subplots, called facets, using
facet_wrap()
# separate subplots for the two transmission types ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point() + facet_wrap(~am) # One can also do facets in two dimensions ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point() + facet_wrap(am ~ cyl)
$\$
Sometimes points overlap making it hard to estimate the number of points at a particular range of values.
We can control the transparency of points by changing their alpha values.
library(gapminder) # a lot of overplotting ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) + geom_point() # changing the transparency levels makes it a bit easier to see how many points are at a given x, y location ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) + geom_point(alpha = .1)
$\$
Each visual attribute that has an aesthetic mapping has a default scales. We can
change the scales used for each mapping using functions that start with
scale_
.
For example, we can change the x-scale from liner to logarithmic using
scale_x_continuous(trans='log10')
. Likewise we can change the color scale
using scale_color_manual()
.
# changing the scale on the x-axis ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) + geom_point(alpha = .2) + scale_x_continuous(trans='log10') # mapping continents to colors, and adding my own color scale ggplot(gapminder, aes(x = gdpPercap, y = lifeExp, col = continent)) + geom_point(alpha = .2) + scale_x_continuous(trans='log10') + scale_color_manual(values = c("red", "yellow", "green", "blue", "purple"))
$\$
We can use different geoms to create other types of plots.
# Let's create a plot that shows the GDP in the United States as a function of the year using the geom geom_line() gapminder |> filter(country == 'United States') |> ggplot(aes(x = year, y = gdpPercap)) + geom_line() # Let's create a plot that shows the GDP in the United States as a function of the year using the geom geom_col() gapminder |> filter(country == 'United States') |> ggplot(aes(x = year, y = gdpPercap)) + geom_col() # Let's MPG as a function of weight using the names of cars rather than just points for each car mtcars |> tibble::rownames_to_column() |> ggplot(aes(x = wt, y = mpg)) + geom_text(aes(label = rowname)) # Let's plot a histogram of the weights of cars ggplot(mtcars, aes(x = wt)) + geom_histogram(bins = 10) # Let's create a boxplot of the weights of cars ggplot(mtcars, aes(x = "", y = wt)) + geom_boxplot() # Let's create a side-by-side boxplot of the weights of cars depending on the number of cylinders the engine has ggplot(mtcars, aes(x = factor(cyl), y = wt)) + geom_boxplot()
$\$
Violin and Joy plots are other ways to view distributions of data
# violin plot ggplot(mtcars, aes(x = factor(cyl), y = wt)) + geom_violin() library("ggridges") # joy plot ggplot(mtcars, aes(y = factor(cyl), x = wt)) + geom_density_ridges()
Question: Can you figure out where the name "joy plot" comes from?
$\$
We can also have multiple geom layers on a single graph by using the + symbol
E.g ggplot(…) + geom_type1() + geom_type2()
# Create a scatter plot of miles per gallon as a function of weight and then add a smoothed line using geom_smooth() and a vertical lines using geom_vline() ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point() + geom_smooth() + geom_vline(xintercept = 3)
$\$
We can also use different themes to change the appearance of our plot.
# Add theme_classic() to our plot ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point() + xlab("Weight") + ylab("Miles per Gallon") + theme_classic() + theme( # modify the theme by: axis.text.y = element_blank(), # - turning off the y-axis text plot.background = element_rect(fill = "red") # - making the background red ) # see ? theme for more options # install.packages("ggthemes") library(ggthemes) ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point() + ggtitle("Cars!") + theme_fivethirtyeight()
$\$
Try to create some interesting visualizations from either a data set we have used in the class, or a new data set you found.
$\$
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.