library(learnr)
library(gapminder)
library(ggrepel)
library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE, fig.align="center", fig.width = 5, fig.height = 4)
tutorial_options(exercise.timelimit = 60, exercise.blanks = "___+", exercise.eval=T)

#no factors please
gapminder <- gapminder %>% 
  mutate(country = as.character(country),
         continent = as.character(continent))

gap_92 <- gapminder %>% 
  filter(year == 1992) %>% 
  mutate(gdp = gdpPercap * pop / 1e9) 

df <- gapminder %>% 
  filter(country == 'Romania')

BaseR plotting

plot(gapminder$gdpPercap, gapminder$lifeExp)

Grammar of graphics

ggplot2

What is a statistical graphic?

Example with Gapminder data

How are variables mapped to aesthetic attributes of points?

gapminder %>% 
  filter(year == 1992) %>% 
  mutate(gdp = gdpPercap * pop / 1e9) %>% 
  ggplot(aes(gdp, lifeExp)) + 
  geom_point(aes(color = continent, size = pop)) + 
  scale_x_log10() +
  xlab('Gross Domestic Product (Billions $)') +
  ylab('Life Expectancy at birth (years)') +
  ggtitle('Gapminder for 1992')

How to use it?

Construct a graphic by adding modular pieces

Using ggplot

The key is to understand the concepts and basic mechanics

The details for any given plot type, or attribute are easy to look up

Let's try a scatterplot

gap_92 <- gapminder %>% 
  filter(year == 1992) %>% 
  mutate(gdp = gdpPercap * pop / 1e9) 
gap_92 %>% head(4)

Let's try a scatterplot

ggplot(gap_92, mapping = aes(x = gdp, y = lifeExp)) + 
  geom_point()

Scales

ggplot(gap_92, mapping = aes(x = gdp, y = lifeExp)) + 
  geom_point() +
  scale_x_log10() 

Scales

Adding more aesthetic mappings (shape)

ggplot(gap_92, mapping = aes(x = gdp, y = lifeExp, shape = continent)) + 
  geom_point() +
  scale_x_log10() 

Adding more aesthetic mappings (color)

ggplot(gap_92, mapping = aes(x = gdp, y = lifeExp, color = continent)) + 
  geom_point() +
  scale_x_log10() 

Labels {.smaller}

labs function adds custom axis labels and titles

ggplot(gap_92, mapping = aes(x = gdp, y = lifeExp)) + 
  geom_point() +
  scale_x_log10() +
  labs(x = 'Gross Domestic Product (Billions $)',
       y = 'Life Expectancy at birth (years)',
       title = 'Gapminder for 1992')

Key geoms {.smaller}

And many more...

Comparing 2 continuous variables

geom_line

df <- gapminder %>% 
  filter(country == 'Romania') 
ggplot(df, mapping = aes(x = year, y = lifeExp)) + 
  geom_line()

Layering geoms {.smaller}

We can add as many geoms to a plot as we want, stacked on as 'layers' in order

ggplot(df, mapping = aes(x = year, y = lifeExp)) + 
  geom_line() +
  geom_point()

What if we had multiple data points per year?

df <- gapminder %>% 
  filter(country %in% c('Romania', 'Thailand'))
ggplot(df, mapping = aes(x = year, y = lifeExp)) + 
  geom_line() +
  geom_point()

Need to separate them by country (group aesthetic)

ggplot(df, mapping = aes(x = year, y = lifeExp, group = country)) + 
  geom_line() +
  geom_point()

Often useful to color lines by group, use color aesthetic with a categorical variable and it automatically groups

ggplot(df, mapping = aes(x = year, y = lifeExp, color = country)) + 
  geom_line() +
  geom_point()

Multiple aesthetic mappings

ggplot(df, mapping = aes(x = year, y = lifeExp)) + 
  geom_line(mapping = aes(color = country)) +
  geom_point()

Multiple aesthetic mappings

ggplot(df, mapping = aes(x = year, y = lifeExp, color = country)) + 
  geom_line(linetype = 'dashed', size = 0.5) +
  geom_point(color = 'black', size = 3, alpha = 0.75)

Plotting trendlines {.smaller}

How to depict the 'average' relationship between noisy variables?

ggplot(gap_92, mapping = aes(x = gdp, y = lifeExp)) + 
  geom_point() + 
  scale_x_log10() +
  labs(x = 'Gross Domestic Product (Billions $)', y = 'Life Expectancy at birth (years)') 

Plotting trendlines {.smaller}

geom_line() doesn't work!

ggplot(gap_92, mapping = aes(x = gdp, y = lifeExp)) + 
  geom_line() +
  geom_point() + 
  scale_x_log10() +
  labs(x = 'Gross Domestic Product (Billions $)', y = 'Life Expectancy at birth (years)') 

geom_smooth {.smaller}

geom_smooth() shows the average ('smoothed') relationship

ggplot(gap_92, mapping = aes(x = gdp, y = lifeExp)) + 
  geom_point() + 
  geom_smooth() +
  scale_x_log10() +
  labs(x = 'Gross Domestic Product (Billions $)', y = 'Life Expectancy at birth (years)') 

geom_smooth {.smaller}

Can be used to show a linear trendline

ggplot(gap_92, mapping = aes(x = gdp, y = lifeExp)) + 
  geom_point() + 
  geom_smooth(method = 'lm') +
  scale_x_log10() +
  labs(x = 'Gross Domestic Product (Billions $)', y = 'Life Expectancy at birth (years)') 

geom_smooth to simplify plots

Can be very helpful to condense down relationships from complicated data

ggplot(gapminder, mapping = aes(x = gdpPercap, y = lifeExp, color = continent)) + 
  geom_point() +
  scale_x_log10() 

geom_smooth to simplify plots

Can be very helpful to condense down relationships from complicated data

ggplot(gapminder, mapping = aes(x = gdpPercap, y = lifeExp, color = continent)) + 
  geom_smooth(method = 'lm') +
  scale_x_log10() 

Types of plots

Plotting a single variable

geom_bar {.smaller}

Given a single discrete variable we can plot its distribution as a 'bar plot' using geom_bar()

ggplot(gapminder, mapping = aes(x = continent)) +
  geom_bar()

geom_histogram {.smaller}

For a single continuous variable, we can generate a histogram using geom_histogram which bins the values and then makes a bar plot

ggplot(gapminder, mapping = aes(x = gdpPercap)) +
  geom_histogram() 

We can adjust the axis scale and other features as usual

ggplot(gapminder, mapping = aes(x = gdpPercap)) +
  geom_histogram() +
  scale_x_log10()

geom_histogram {.smaller}

We can change the number of bins (can also specify details of bin positions)

  ggplot(gapminder, aes(gdpPercap)) +
  geom_histogram(bins = 100) +
  scale_x_log10()

geom_histogram {.smaller}

Can also encode different continents in different colors by stacking the histograms

ggplot(gapminder, mapping = aes(x = gdpPercap, color = continent)) +
  geom_histogram() +
  scale_x_log10()

fill vs color {.smaller}

ggplot(gapminder, mapping = aes(x = gdpPercap, fill = continent)) +
  geom_histogram() +
  scale_x_log10()

geom_density {.smaller}

Density plots are another way to depict the distribution of a continuous variable. They are just a smoothed histogram

ggplot(gapminder, mapping = aes(x = gdpPercap)) +
  geom_density() +
  scale_x_log10()

geom_density {.smaller}

Separate by continent and give spearate fill colors

ggplot(gapminder, mapping = aes(x = gdpPercap, fill = continent)) +
  geom_density(alpha = 0.5) +
  scale_x_log10()

1 continuous var vs 1 discrete

geom_boxplot {.smaller}

The boxplot is the most common choice for showing the distribution of a continuous variable broken down by a categorical variable

ggplot(gapminder, mapping = aes(x = continent, y = gdpPercap)) +
  geom_boxplot() +
  scale_y_log10()

geom_violin {.smaller}

The violin plot is similar, but shows the distribution as a density plot, rather than a box.

ggplot(gapminder, mapping = aes(x = continent, y = gdpPercap)) +
  geom_violin() +
  scale_y_log10()

Geom_beeswarm

Another useful option is a 'dotplot' or 'beeswarm' plot.

library(ggbeeswarm)
ggplot(gapminder, mapping = aes(x = continent, y = gdpPercap)) +
  geom_beeswarm(size = 0.5, alpha = 0.75, cex = 1) +
  scale_y_log10()

What if I want to control the order?

cont_order <- c('Oceania', 'Europe', 'Americas', 'Asia', 'Africa')
gap_cat <- gapminder %>% 
  mutate(continent = factor(continent, levels = cont_order))
head(gap_cat)

What if I want to control the order?

ggplot(gap_cat, mapping = aes(x = continent, y = gdpPercap)) +
  geom_boxplot() +
  scale_y_log10()

What if I want to control the order?

forcats package has lots of useful helper functions for changing order of factor variables.

gap_cat <- gap_cat %>% 
  mutate(continent = fct_reorder(continent, gdpPercap, median))
ggplot(gap_cat, mapping = aes(x = continent, y = gdpPercap)) +
  geom_boxplot() +
  scale_y_log10()

geom_col {.smaller}

If you want to plot a single value for each of a continuous variable, use geom_col

gap_82 <- gapminder %>% 
  filter(year == 1982, continent == 'Americas')

ggplot(gap_82, mapping = aes(x = country, y = gdpPercap)) + 
  geom_col()

theme

Saving your plots

ggplot(gapminder, mapping = aes(x = continent, y = gdpPercap)) +
  geom_violin() +
  scale_y_log10()
ggsave(filename = here::here('results', 'my_fig.png'))

Key practical tips

Additional Resources/References

Additional material

Some notes on using color

If we map a continuous variable to color it won't group automatically

ggplot(df, mapping = aes(x = year, y = lifeExp, color = gdpPercap)) +
  geom_line() +
  geom_point(size = 3)

Some notes on using color

We need to specify group manually

ggplot(df, mapping = aes(x = year, y = lifeExp,
                         group = country, color = gdpPercap)) +
 geom_line() +
  geom_point(size = 3)

Some notes on using color {.smaller}

my_df <- gapminder %>%
  filter(year %in% c(1957, 1977, 1997))
ggplot(my_df, mapping = aes(x = gdpPercap, y = lifeExp, color = factor(year))) +
  geom_point() +
  scale_x_log10() +
  labs(color = 'year')

Color palettes {.smaller}

We can use scale_color_manual to set the color of each group manually

my_cols <- c(Romania = 'green', Thailand = 'orange')

ggplot(df, mapping = aes(x = year, y = lifeExp, color = country)) +
  geom_line() +
  scale_color_manual(values = my_cols)

scale_color_brewer offers some useful default color schemes

ggplot(df, mapping = aes(x = year, y = lifeExp, color = country)) +
  geom_line() +
  scale_color_brewer(palette = 'Dark2')

Rcolorbrewer {.smaller}

https://www.r-bloggers.com/a-detailed-guide-to-ggplot-colors/

Facets {.smaller}

Facets allow you to easily break a single plot into multiple plots based on variable.

gap_early <- gapminder %>%
  filter(year < 1970)

ggplot(gap_early, mapping = aes(x = gdpPercap, y = lifeExp)) +
  geom_point() +
  geom_smooth(se = FALSE) +
  scale_x_log10() +
  facet_wrap(~continent)

Or based on multiple variables

ggplot(gap_early, mapping = aes(x = gdpPercap, y = lifeExp)) +
  geom_point() +
  geom_smooth(se = FALSE) +
  scale_x_log10() +
  facet_grid(year ~ continent)

geom_text {.smaller}

gap_df <- gapminder %>%
  filter(year == 1992, continent == 'Americas') %>%
  mutate(gdp = gdpPercap * pop / 1e9) %>%
  head(20)

You can add text labels to the points with geom_text

ggplot(gap_df, mapping = aes(x = gdp, y = lifeExp, label = country)) +
  geom_text() +
  geom_point() +
  geom_smooth(method = 'lm', se = FALSE) +
  scale_x_log10() +
  labs(x = 'Gross Domestic Product (Billions $)', y = 'Life Expectancy at birth (years)')

Or with geom_label

ggplot(gap_df, mapping = aes(x = gdp, y = lifeExp, label = country)) +
  geom_label() +
  geom_point() +
  geom_smooth(method = 'lm', se = FALSE) +
  scale_x_log10() +
  labs(x = 'Gross Domestic Product (Billions $)', y = 'Life Expectancy at birth (years)')

ggrepel {.smaller}

library(ggrepel)
ggplot(gap_df, mapping = aes(x = gdp, y = lifeExp)) +
  geom_point() +
  geom_smooth(method = 'lm', se = FALSE) +
  scale_x_log10() +
  labs(x = 'Gross Domestic Product (Billions $)', y = 'Life Expectancy at birth (years)') +
  geom_label_repel(aes(label = country), size = 2.5)

Beautification

There are lots of ways to add aesthetic improvements to your figures relatively easily

my_plot <- ggplot(gap_92, aes(gdp, lifeExp)) + 
  geom_point(aes(color = continent, size = pop)) + 
  scale_x_log10() +
  xlab('Gross Domestic Product (Billions $)') +
  ylab('Life Expectancy at birth (years)') +
  ggtitle('Gapminder for 1992')
my_plot

There are a number of pre-packaged 'themes' you can apply

my_plot + theme_minimal()

Tip for making nice scatterplots {.smaller}

Set the marker shape to one that can be 'filled' (pch = 21 is a filled circle), then use a thin white border around a filled shape to help distinguish overlaps.

ggplot(gap_92, aes(gdp, lifeExp)) + 
  geom_point(pch = 21, stroke = 0.5, alpha = 0.8, size = 2.5, color = 'white', aes(fill = continent)) + 
  scale_x_log10() +
  labs(x = 'Gross Domestic Product (Billions $)', y = 'Life Expectancy at birth (years)', title = 'Gapminder for 1992') +
  theme_minimal()

ggpubr {.smaller}

Add stats directly to your figures

library(ggpubr)
my_comparisons <- list( c("Africa", "Asia"), c('Europe', 'Oceania'))
ggplot(gapminder, mapping = aes(x = continent, y = gdpPercap)) +
  geom_violin() +
  scale_y_log10() +
  stat_compare_means(method = 'wilcox.test', comparisons = my_comparisons)

ggpubr {.smaller}

Easily add correlation coefficients

ggplot(gap_92, mapping = aes(x = lifeExp, y = gdpPercap)) +
  geom_point() +
  scale_y_log10() +
  geom_smooth(method = 'lm') +
  stat_cor()

cowplot {.smaller}

Great tool for combining multiple 'panels' into one plot

library(cowplot)

p1 <- ggplot(mtcars, aes(disp, mpg)) + 
  geom_point()
p2 <- ggplot(mtcars, aes(qsec, mpg)) +
  geom_point()
plot_grid(p1, p2, labels = c('A', 'B'))

complexheatmap

Great tool for making heatmaps. See VERY detailed documentation with examples here



AshirBorah/cp_bootcamp_r_tutorials documentation built on May 16, 2024, 3:24 p.m.