DOI CRAN version

gapminder

knitr::opts_chunk$set(collapse = TRUE, dpi = 300)
## so jittered figs don't always appear to be changed
set.seed(1)

Excerpt from the Gapminder data. The main object in this package is the gapminder data frame or "tibble". There are other goodies, such as the data in tab delimited form, a larger unfiltered dataset, and premade color schemes for the countries and continents.

The gapminder data frames include six variables, (Gapminder.org documentation page):

| variable | meaning | |:------------|:-------------------------| | country | | | continent | | | year | | | lifeExp | life expectancy at birth | | pop | total population | | gdpPercap | per-capita GDP |

Per-capita GDP (Gross domestic product) is given in units of international dollars, "a hypothetical unit of currency that has the same purchasing power parity that the U.S. dollar had in the United States at a given point in time" -- 2005, in this case.

Package contains two data frames or tibbles:

Install and test drive

Install gapminder from CRAN:

install.packages("gapminder")

Or you can install gapminder from GitHub:

devtools::install_github("jennybc/gapminder")

Load it and test drive with some data aggregation and plotting:

library("gapminder")

aggregate(lifeExp ~ continent, gapminder, median)

suppressPackageStartupMessages(library("dplyr"))
gapminder %>%
    filter(year == 2007) %>%
    group_by(continent) %>%
    summarise(lifeExp = median(lifeExp))

library("ggplot2")
ggplot(gapminder, aes(x = continent, y = lifeExp)) +
  geom_boxplot(outlier.colour = "hotpink") +
  geom_jitter(position = position_jitter(width = 0.1, height = 0), alpha = 1/4)

Color schemes for countries and continents

country_colors and continent_colors are provided as character vectors where elements are hex colors and the names are countries or continents.

head(country_colors, 4)
head(continent_colors)

The country scheme is available in this repo as

How to use color scheme in ggplot2

Provide country_colors to scale_color_manual() like so:

... + scale_color_manual(values = country_colors) + ...
library("ggplot2")

ggplot(subset(gapminder, continent != "Oceania"),
       aes(x = year, y = lifeExp, group = country, color = country)) +
  geom_line(lwd = 1, show_guide = FALSE) + facet_wrap(~ continent) +
  scale_color_manual(values = country_colors) +
  theme_bw() + theme(strip.text = element_text(size = rel(1.1)))

How to use color scheme in base graphics

# for convenience, integrate the country colors into the data.frame
gap_with_colors <-
  data.frame(gapminder,
             cc = I(country_colors[match(gapminder$country,
                                         names(country_colors))]))

# bubble plot, focus just on Africa and Europe in 2007
keepers <- with(gap_with_colors,
                continent %in% c("Africa", "Europe") & year == 2007)
plot(lifeExp ~ gdpPercap, gap_with_colors,
     subset = keepers, log = "x", pch = 21,
     cex = sqrt(gap_with_colors$pop[keepers]/pi)/1500,
     bg = gap_with_colors$cc[keepers])

What is gapminder good for?

I have used this excerpt in STAT 545 since 2008 and, more recently, in R-flavored Software Carpentry Workshops and a ggplot2 tutorial. gapminder is very useful for teaching novices data wrangling and visualization in R.

Description:

There are 12 rows for each country in gapminder, i.e. complete data for 1952, 1955, ..., 2007.

The two factors provide opportunities to demonstrate factor handling, in aggregation and visualization, for factors with very few and very many levels.

The four quantitative variables are generally quite correlated with each other and these trends have interesting relationships to country and continent, so you will find that simple plots and aggregations tell a reasonable story and are not completely boring.

Visualization of the temporal trends in life expectancy, by country, is particularly rewarding, since there are several countries with sharp drops due to political upheaval. This then motivates more systematic investigations via data aggregation to proactively identify all countries whose data exhibits certain properties.

How this sausage was made

The data-raw directory contains the Excel spreadsheets downloaded from Gapminder in 2008 and 2009 and all the scripts necessary to create everything in this package, in raw and "compiled notebook" form.

Plain text delimited files

If you want to practice importing from file, various tab delimited files are included:

Here in the source, these delimited files can be found:

Once you've installed the gapminder package they can be found locally and used like so:

gap_tsv <- system.file("gapminder.tsv", package = "gapminder")
gap_tsv <- read.delim(gap_tsv)
str(gap_tsv)
gap_tsv %>% # Bhutan did not make the cut because data for only 8 years :(
  filter(country == "Bhutan")

gap_bigger_tsv <- system.file("gapminder-unfiltered.tsv", package = "gapminder")
gap_bigger_tsv <- read.delim(gap_bigger_tsv)
str(gap_bigger_tsv)
gap_bigger_tsv %>% # Bhutan IS here though! :)
  filter(country == "Bhutan")

License

Gapminder's data is released under the Creative Commons Attribution 3.0 Unported license. See their terms of use.



YTLogos/gapminder documentation built on May 20, 2019, 1:47 p.m.