get_wpp: Download UN DESA WPP data

View source: R/get_wpp.R

get_wppR Documentation

Download UN DESA WPP data

Description

Downloads data on demographic indicators in UN DESA WPP. Requires a working internet connection.

Usage

get_wpp(
  indicator = NULL,
  indicator_file = NULL,
  pop_age = c("total", "single", "five"),
  pop_sex = c("total", "both", "male", "female", "all"),
  pop_freq = c("annual", "five"),
  pop_date = c("jul1", "jan1", "jan1-dec31"),
  variant_id = 2,
  wpp_version = 2022,
  clean_names = FALSE,
  fct_age = TRUE,
  drop_id_cols = FALSE,
  tidy_pop_sex = FALSE,
  add_regions = FALSE,
  add_iso_codes = FALSE,
  messages = TRUE,
  server = c("github", "local")
)

Arguments

indicator

Character string based on the name column in the wpp_indicators data frame or pop. Represents the variables to be downloaded.

indicator_file

Character string based on the file column in the wpp_indicators data frame . Represents the file group to download data from. Needed for obtaining indicators that are available with different levels of granularities (such as Births in overall population or Births by mothers age group).

pop_age

Character string for population age groups if indicator is set to pop. Defaults to no age groups total, but can be set to single or five.

pop_sex

Character string for population sexes if indicatoris set to pop. Defaults to no sex total, but can be set to male, female, both or all.

pop_freq

Character string for frequency of population data if indicator is set to pop. Defaults to annual, but in a some (exceptional cases) can be set to five.

pop_date

Character string for frequency of population data if indicator is set to pop. Defaults to jul1 (July 1st), but for WPP2022 can be set to jan1 for population at beginning of year or jan1-dec31 for exposure population.

variant_id

Numeric value(s) based on the var_id column in the wpp_indicators data frame. Note, past data is in the "Medium" (2) variant only.

wpp_version

Integer for WPP version. Default of 2019. All WPP back to 1998 are available.

clean_names

Logical to indicate if column names should be cleaned

fct_age

Logical to indicate if AgeGrp column be converted to a factor.

drop_id_cols

Logical to indicate if VarId, LocID, MidPeriod, AgeGrpStart, AgeGrpSpan and SexID columns to be removed.

tidy_pop_sex

Logical to indicate if columns for sex specific population data should be stacked into single population column with an accompanying new sex column.

add_regions

Logical to indicate if to add a reg_name and area_name columns for countries (where LocID is less than 900)

add_iso_codes

Logical to indicate if to add a iso3 and iso2 columns for ISO 3 and 2 letter country codes (where LocID is less than 900)

messages

Logical to not suppress printing of messages.

server

Character string for location to download data from. Default of github.

Details

Indicators must use the name corresponding to the name column in in the wpp_indicators data frame. The find_indicator() function can be used to look up the indicator code and availability by variants

There are 114 different indicators in WPP data (starting from 1998). See the full table of the different indicators available in each WPP.

The variant_id argument must be one or more numbers from the var_id column in the wpp_indicators data frame. Not all indicators area available in all variants. Use the find_indicator() function to check availability. There are 14 different variants in WPP data (starting from 1998).

var_id variant
2 Medium
3 High
4 Low
5 Constant fertility
6 Instant replacement
7 Zero migration
8 Constant mortality
9 No change
10 Momentum
16 Instant replacement zero migration
202 Median PI (BHM median in WPP2015)
203 Upper 80 PI
204 Lower 80 PI
206 Upper 95 PI
207 Lower 95 PI

Value

A tibble with downloaded data in tidy format

Examples


# single indicator from medium variant of latest WPP
get_wpp(indicator = "TFR")

# single indicator from multiple variants of latest WPP
get_wpp(indicator = "TFR", variant_id = c(2, 3, 4))

# some indicators appear in multiple file groups, for example Births
# represents total number of births in the country in the
# Demographic_Indicators file (chosen by default)
get_wpp(indicator = "Births")

# specify indicator_file to get number of births by mothers 5-year age group
get_wpp(indicator = c("Births", "ASFR"),
        indicator_file = "Fertility_by_Age5",
        drop_id_cols = TRUE)

# PopTotal, PopMale and PopFemale indicators are in many WPP files with
# a wide range granularity. Set indicator = "pop" and use the pop_sex,
# pop_age, pop_freq and pop_date to get desired data from the appropriate
# indicator_file...

# when using indicator = "pop" get_wpp() defaults to annual total population
# (summed over age and sex)
get_wpp(indicator = "pop")

# use pop_sex to get specific sexes (or both or all)
get_wpp(indicator = "pop", pop_sex = "male")

# use pop_age to specify age groups
get_wpp(indicator = "pop", pop_sex = "both", pop_age = "five")

# use pop_date to specify populations at start of year (rather than mid-year)
get_wpp(indicator = "pop", pop_sex = "female", pop_age = "five", pop_date = "jan1")

# tidy sex into a single column and drop id columns
get_wpp(indicator = "pop", pop_sex = "both", pop_age = "five",
        tidy_pop_sex = TRUE, drop_id_cols = TRUE)

# alternatively use indicator_file to select the desired version of population indicator(s)
get_wpp(indicator = c("PopTotal", "PopMale", "PopFemale"), indicator_file =  "TotalPopulationBySex")

# clean column names
get_wpp(indicator = c("SRB", "NetMigrations", "PopGrowthRate"),
        clean_names = TRUE, drop_id_cols = TRUE)


guyabel/tidywpp documentation built on July 14, 2022, 7:08 a.m.