Overview of the wcde package

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

The wcde package allows for R users to easily download data from the Wittgenstein Centre for Demography and Human Capital Data Explorer as well as containing a number of helpful functions for working with education specific demographic data.

Installation

You can install the released version of wcde from CRAN with:

install.packages("wcde")

Install the developmental version with:

library(devtools)
install_github("guyabel/wcde", ref = "main")

Getting data into R

The get_wcde() function can be used to download data from the Wittgenstein Centre Human Capital Data Explorer. It requires three user inputs

library(wcde)
# download education specific tfr data
get_wcde(indicator = "etfr",
         country_name = c("Brazil", "Albania"))

# download education specific survivorship rates
get_wcde(indicator = "eassr",
         country_name = c("Niger", "Korea"))

Indicator codes

The indicator input must match the short code from the indicator table. The find_indicator() function can be used to look up short codes (given in the first column) from the wic_indicators data frame:

find_indicator(x = "tfr")

Temporal coverage

By default, get_wdce() returns data for all years or available periods or years. The filter() function in dplyr can be used to filter data for specific years or periods, for example:

library(tidyverse)
get_wcde(indicator = "e0",
         country_name = c("Japan", "Australia")) %>%
  filter(period == "2015-2020")

get_wcde(indicator = "sexratio",
         country_name = c("China", "South Korea")) %>%
  filter(year == 2020)

Past data is only available for selected indicators. These can be viewed using the past indicator column:

wic_indicators %>%
  filter(past) %>%
  select(1:2)

The filter() function can also be used to filter specific indicators to specific age, sex or education groups

get_wcde(indicator = "sexratio",
         country_name = c("China", "South Korea")) %>%
  filter(year == 2020,
         age == "All")

Country names and codes

Country names are guessed using the countrycode package.

get_wcde(indicator = "tfr",
         country_name = c("U.A.E", "Espania", "Österreich"))

The get_wcde() functions accepts ISO alpha numeric codes for countries via the country_code argument:

get_wcde(indicator = "etfr", country_code = c(44, 100))

A full list of available countries and region aggregates, and their codes, can be found in the wic_locations data frame.

wic_locations

Scenarios

By default get_wcde() returns data for Medium (SSP2) scenario. Results for different SSP scenarios can be returned by passing a different (or multiple) scenario values to the scenario argument in get_data().

get_wcde(indicator = "growth",
         country_name = c("India", "China"),
         scenario = c(1:3, 21, 22)) %>%
  filter(period == "2095-2100")

Set include_scenario_names = TRUE to include a columns with the full names of the scenarios

get_wcde(indicator = "tfr",
         country_name = c("Kenya", "Nigeria", "Algeria"),
         scenario = 1:3,
         include_scenario_names = TRUE) %>%
  filter(period == "2045-2050")

Additional details of the pathways for each scenario numeric code can be found in the wic_scenarios object. Further background and links to the corresponding literature are provided in the Data Explorer

wic_scenarios

All countries data

Data for all countries can be obtained by not setting country_name or country_code

get_wcde(indicator = "mage")

Multiple indicators

The get_wdce() function needs to be called multiple times to download multiple indicators. This can be done using the map() function in purrr

mi <- tibble(ind = c("odr", "nirate", "ggapedu25")) %>%
  mutate(d = map(.x = ind, .f = ~get_wcde(indicator = .x)))
mi

mi %>%
  filter(ind == "odr") %>%
  select(-ind) %>%
  unnest(cols = d)

mi %>%
  filter(ind == "nirate") %>%
  select(-ind) %>%
  unnest(cols = d)

mi %>%
  filter(ind == "ggapedu25") %>%
  select(-ind) %>%
  unnest(cols = d)

Working with population data

Population data for a range of age-sex-educational attainment combinations can be obtained by setting indicator = "pop" in get_wcde() and specifying a pop_age, pop_sex and pop_edu arguments. By default each of the three population breakdown arguments are set to "total"

get_wcde(indicator = "pop", country_name = "India")

The pop_age argument can be set to all to get population data broken down in five-year age groups. The pop_sex argument can be set to both to get population data broken down into female and male groups. The pop_edu argument can be set to four, six or eight to get population data broken down into education categorizations with different levels of detail.

get_wcde(indicator = "pop", country_code = 900, pop_edu = "four")

The population breakdown arguments can be used in combination to provide further breakdowns, for example sex and education specific population totals

get_wcde(indicator = "pop", country_code = 900, pop_edu = "six", pop_sex = "both")

The full age-sex-education specific data can also be obtained by setting indicator = "epop" in get_wcde().

Population pyramids

Create population pyramids by setting male population values to negative equivalent to allow for divergent columns from the y axis.

w <- get_wcde(indicator = "pop", country_code = 900,
              pop_age = "all", pop_sex = "both", pop_edu = "four")
w

w <- w %>%
  mutate(pop_pm = ifelse(test = sex == "Male", yes = -pop, no = pop),
         pop_pm = pop_pm/1e3)
w

Standard plot

Use standard ggplot code to create population pyramid with

Note wic_col6 and wic_col8 objects also exist for equivalent plots of population data objects with corresponding numbers of categories of education.

library(lemon)

w %>%
  filter(year == 2020) %>%
  ggplot(mapping = aes(x = pop_pm, y = age, fill = fct_rev(education))) +
  geom_col() +
  geom_vline(xintercept = 0, colour = "black") +
  scale_x_symmetric(labels = abs) +
  scale_fill_manual(values = wic_col4, name = "Education") +
  labs(x = "Population (millions)", y = "Age") +
  theme_bw()

Sex label position

Add male and female labels on the x-axis by

w <- w %>%
  mutate(pop_max = ifelse(sex == "Male", -max(pop/1e3), max(pop/1e3)))

w %>%
  filter(year == 2020) %>%
  ggplot(mapping = aes(x = pop_pm, y = age, fill = fct_rev(education))) +
  geom_col() +
  geom_vline(xintercept = 0, colour = "black") +
  scale_x_continuous(labels = abs, expand = c(0, 0)) +
  scale_fill_manual(values = wic_col4, name = "Education") +
  labs(x = "Population (millions)", y = "Age") +
  facet_wrap(facets = "sex", scales = "free_x", strip.position = "bottom") +
  geom_blank(mapping = aes(x = pop_max * 1.1)) +
  theme(panel.spacing.x = unit(0, "pt"),
        strip.placement = "outside",
        strip.background = element_rect(fill = "transparent"),
        strip.text.x = element_text(margin = margin( b = 0, t = 0)))

Animate

Animate the pyramid through the past data and projection periods using the transition_time() function in the gganimate package

library(gganimate)

g <- ggplot(data = w,
       mapping = aes(x = pop_pm, y = age, fill = fct_rev(education))) +
  geom_col() +
  geom_vline(xintercept = 0, colour = "black") +
  scale_x_continuous(labels = abs, expand = c(0, 0)) +
  scale_fill_manual(values = wic_col4, name = "Education") +
  facet_wrap(facets = "sex", scales = "free_x", strip.position = "bottom") +
  geom_blank(mapping = aes(x = pop_max * 1.1)) +
  theme(panel.spacing.x = unit(0, "pt"),
        strip.placement = "outside",
        strip.background = element_rect(fill = "transparent"),
        strip.text.x = element_text(margin = margin(b = 0, t = 0))) +
  transition_time(time = year) +
  labs(x = "Population (millions)", y = "Age",
       title = 'SSP2 World Population {round(frame_time)}')

animate(g, width = 672, height = 520, units = "px", res = 100,
        renderer = gifski_renderer())

anim_save(filename = "../man/figures/world4_ssp2.gif")
library(gganimate)

ggplot(data = w,
       mapping = aes(x = pop_pm, y = age, fill = fct_rev(education))) +
  geom_col() +
  geom_vline(xintercept = 0, colour = "black") +
  scale_x_continuous(labels = abs, expand = c(0, 0)) +
  scale_fill_manual(values = wic_col4, name = "Education") +
  facet_wrap(facets = "sex", scales = "free_x", strip.position = "bottom") +
  geom_blank(mapping = aes(x = pop_max * 1.1)) +
  theme(panel.spacing.x = unit(0, "pt"),
        strip.placement = "outside",
        strip.background = element_rect(fill = "transparent"),
        strip.text.x = element_text(margin = margin(b = 0, t = 0))) +
  transition_time(time = year) +
  labs(x = "Population (millions)", y = "Age",
       title = 'SSP2 World Population {round(frame_time)}')



Try the wcde package in your browser

Any scripts or data that you put into this service are public.

wcde documentation built on June 7, 2022, 1:11 a.m.