download_wid: Download data from WID.world

View source: R/download-wid.R

download_widR Documentation

Download data from WID.world

Description

Downloads data from the World Wealth and Income Database (http://WID.world) into a data.frame. Type vignette("wid-demo") for a detailed presentation.

Usage

download_wid(
  indicators = "all",
  areas = "all",
  years = "all",
  perc = "all",
  ages = "all",
  pop = "all",
  metadata = FALSE,
  include_extrapolations = TRUE,
  verbose = FALSE
)

Arguments

indicators

List of six-letter strings, or "all": code names of the indicators in the database. Default is "all" for all indicators. See 'Details' for more.

areas

List of strings, or "all": area code names of the database. "XX" for countries/regions, "XX-YY" for subregions. Default is "all" for all areas. See 'Details' for more.

years

Numerical vector, or "all": years to retrieve. Default is "all" for all years.

perc

List of strings, or "all": percentiles take the form "pXX" or "pXXpYY". Default is "all" for all percentiles. See 'Details' for more.

ages

Numerical vector, or "all": age category codes in the database. 999 for all ages, 992 for adults. Default is "all" for all age categories. See 'Details' for more.

pop

List of characters, or "all": type of population. "t" for tax units, "i" for individuals. Default is "all" for all population types. See 'Details' for more.

metadata

Should the function fetch metadata too (ie. variable descriptions, sources, methodological notes, etc.)? Default is FALSE.

include_extrapolations

Should the function return estimates that are the results of extrapolations and interpolations based on limited data? Default is TRUE.

verbose

Should the function indicate the progress of the request? Default is FALSE.

Details

Although all arguments default to "all", you cannot download the entire database by typing download_wid(). The command requires you to specify either some indicators or some areas. To download the entire database, please visit https://wid.world/data/ and choose "download full dataset".

If there is no data matching you selection on WID.world (maybe because you specified an indicator or an area that doesn't exist), the command will return NULL with a warning.

All monetary amounts for countries and country subregions are in constant local currency of the reference year (i.e. the previous year, the database being updated every year around July). Monetary amounts for world regions are in EUR PPP of the reference year. You can access the price index using the indicator inyixx, the PPP exchange rates using xlcusp (USD), xlceup (EUR), xlcyup (CNY), and the market exchange rates using xlcusx (USD), xlceux (EUR), xlcyux (CNY). To check the current reference year, you can look at when the price index is equal to 1.

Shares and wealth/income ratios are given as a fraction of 1. That is, a top 1% share of 20% is given as 0.2. A wealth/income ratio of 300% is given as 3.

The arguments of the command follow a nomenclature specific to WID.world. We provide more details with a few examples below. For the complete up-to-date documentation of the structure of the database, please visit https://wid.world/codes-dictionary.

Indicators

The argument indicators is a vector of 6-letter codes that corresponds to a given series type for a given income or wealth concept. The first letter correspond to the type of series. Some of the most common possibilities include:

one-letter code      description
a      average
s      share
t      threshold
m      macroeconomic total
w      wealth/income ratio

The next five letters correspond a concept (usually of income and wealth). Some of the most common possibilities include:

five-letter code      description
ptinc      pre-tax national income
pllin      pre-tax labor income
pkkin      pre-tax capital income
fiinc      fiscal income
hweal      net personal wealth

For example, sfiinc corresponds to the share of fiscal income, ahweal corresponds to average personal wealth. If you don't specify any indicator, it defaults to "all" and downloads all available indicators.

Area codes

All data in WID.world is associated to a given area, which can be a country, a region within a country, an aggregation of countries (eg. a continent), or even the whole world. The argument areas is a vector of codes that specify the areas for which to retrieve data. Countries and world regions are coded using 2-letter ISO codes. Country subregions are coded as XX-YY where XX is the country 2-letter code. If you don't specify any area, it defaults to "all" and downloads data for all available areas.

Years

All data in WID.world correspond to a year. Some series go as far back as the 1800s. The argument years is a vector of integer that specify those years. If you don't specify any year, it defaults to "all" and downloads data for all available years.

Percentiles

The key feature of WID.world is that it provides data on the whole distribution, not just totals and averages. The argument perc is a vector of strings that indicate for which part of the distribution the data should be retrieved. For share and average variables, percentiles correspond to percentile ranges and take the form pXXpYY. For example the top 1% share correspond to p99p100. The top 10% share excluding the top 1% is p90p99. Thresholds associated to the percentile group pXXpYY correspond to the minimal income or wealth level that gets you into the group. For example, the threshold of the percentile group p90p100 or p90p91 correspond to the 90% quantile. Variables with no distributional meaning use the percentile p0p100. If you don't specify any percentile, it defaults to "all" and downloads data for all available parts of the distribution.

Age groups

Data may only concern the population in a certain age group. The argument ages is a vector of age codes that specify which age categories to retrieve. Ages are coded using 3-digit codes. Some of the most common possibilities include:

three-digit code      description
999      all ages
992      adults, including elderly (20+)
996      adults, excluding elderly (20-65)

If you don't specify any age, it defaults to "all" and downloads data for all available age groups.

Population types

The data in WID.world can refer to different types of population (i.e. different statistical units). The argument pop is a vector of population codes. They are coded using one-letter codes. Some of the most common possibilities include:

one-letter code      description
i      individuals
t      tax units
j      equal-split adults (ie. income or wealth divided equally among spouses)

If you don't specify any code, it defaults to "all" and downloads data for all types of population.

Extrapolations/interpolations

Some of the data on WID.world is the result of interpolations (when data is only available for a few years) or extrapolations (when data is not available for the most recent years) that are based on much more limited information that other data points. We include these interpolations/extrapolation by default as a convenience, and also because these values are used to perform regional aggregations. Yet we stress that these estimates, especially at the level of individual countries, can be fragile.

For many purposes, it can be preferable to exclude these data points. For that, use the option include_extrapolations = FALSE.

Value

A data.frame with the following columns:

country

The country or area code.

variable

The variable name, which combine the indicator, the age code and the population code.

percentile

The part of the distribution the value relates to.

year

The year the value relates to.

value

The value of the indicator.

If you specify metadata = TRUE, the data.frame also has the following columns:

countryname

The full name of the country/region.

shortname

A short version of the variable full name in plain english.

shortdes

A description of the type of series.

pop

The population type, in plain english.

age

The age group, in plain english.

source

The source for the data.

method

Methodological notes, if any.

imputation

Type of estimate (when applicable). The imputation field is a short qualitative description of the type of estimate provided, which is strongly related to data quality. For technical details, see the method field and papers cited in source.

quality

Data quality (when applicable). The quality field is a score from 0 to 5 indicating the quality of the data.

Author(s)

Thomas Blanchet


WIDworld/wid-r-tool documentation built on Aug. 27, 2024, 5:10 p.m.