download_wid | R Documentation |
Downloads data from the World Wealth and Income Database
(http://WID.world) into a data.frame
.
Type vignette("wid-demo")
for a detailed presentation.
download_wid(
indicators = "all",
areas = "all",
years = "all",
perc = "all",
ages = "all",
pop = "all",
metadata = FALSE,
include_extrapolations = TRUE,
verbose = FALSE
)
indicators |
List of six-letter strings, or |
areas |
List of strings, or |
years |
Numerical vector, or |
perc |
List of strings, or |
ages |
Numerical vector, or |
pop |
List of characters, or |
metadata |
Should the function fetch metadata too (ie. variable
descriptions, sources, methodological notes, etc.)? Default is |
include_extrapolations |
Should the function return estimates that are
the results of extrapolations and interpolations based on limited data?
Default is |
verbose |
Should the function indicate the progress of the request?
Default is |
Although all arguments default to "all"
, you cannot download the
entire database by typing download_wid()
. The command requires you
to specify either some indicators or some areas. To download the entire
database, please visit https://wid.world/data/ and choose "download
full dataset".
If there is no data matching you selection on WID.world (maybe because
you specified an indicator or an area that doesn't exist), the command
will return NULL
with a warning.
All monetary amounts for countries and country subregions are in constant
local currency of the reference year (i.e. the previous year, the database
being updated every year around July). Monetary amounts for world regions
are in EUR PPP of the reference year. You can access the price index using
the indicator inyixx
, the PPP exchange rates using xlcusp
(USD), xlceup
(EUR), xlcyup
(CNY), and the market exchange
rates using xlcusx
(USD), xlceux
(EUR), xlcyux
(CNY). To check the current reference year, you can look at when the price
index is equal to 1.
Shares and wealth/income ratios are given as a fraction of 1. That is, a top 1% share of 20% is given as 0.2. A wealth/income ratio of 300% is given as 3.
The arguments of the command follow a nomenclature specific to WID.world. We provide more details with a few examples below. For the complete up-to-date documentation of the structure of the database, please visit https://wid.world/codes-dictionary.
The argument indicators
is a vector of 6-letter codes that corresponds to a
given series type for a given income or wealth concept. The first letter
correspond to the type of series. Some of the most common possibilities include:
one-letter code | description | |
a | average | |
s | share | |
t | threshold | |
m | macroeconomic total | |
w | wealth/income ratio | |
The next five letters correspond a concept (usually of income and wealth). Some of the most common possibilities include:
five-letter code | description | |
ptinc | pre-tax national income | |
pllin | pre-tax labor income | |
pkkin | pre-tax capital income | |
fiinc | fiscal income | |
hweal | net personal wealth | |
For example, sfiinc
corresponds to the share of fiscal income,
ahweal
corresponds to average personal wealth. If you don't specify
any indicator, it defaults to "all"
and downloads all available indicators.
All data in WID.world is associated to a given area, which can be a country,
a region within a country, an aggregation of countries (eg. a continent), or
even the whole world. The argument areas
is a vector of codes that specify
the areas for which to retrieve data. Countries and world regions are coded
using 2-letter ISO codes. Country subregions are coded as XX-YY
where XX
is the country 2-letter code. If you don't specify any area,
it defaults to "all"
and downloads data for all available areas.
All data in WID.world correspond to a year. Some series go as far back as
the 1800s. The argument years
is a vector of integer that specify
those years. If you don't specify any year, it defaults to "all"
and downloads data for all available years.
The key feature of WID.world is that it provides data on the whole
distribution, not just totals and averages. The argument perc
is a vector of strings that indicate for which part of the distribution
the data should be retrieved. For share and average variables,
percentiles correspond to percentile ranges and take the form pXXpYY
.
For example the top 1% share correspond to p99p100
. The top 10% share
excluding the top 1% is p90p99
. Thresholds associated to the
percentile group pXXpYY
correspond to the minimal income or wealth
level that gets you into the group. For example, the threshold of the
percentile group p90p100
or p90p91
correspond to the 90%
quantile. Variables with no distributional meaning use the percentile p0p100.
If you don't specify any percentile, it defaults to "all"
and
downloads data for all available parts of the distribution.
Data may only concern the population in a certain age group.
The argument ages
is a vector of age codes that specify which
age categories to retrieve. Ages are coded using 3-digit codes.
Some of the most common possibilities include:
three-digit code | description | |
999 | all ages | |
992 | adults, including elderly (20+) | |
996 | adults, excluding elderly (20-65) | |
If you don't specify any age, it defaults to "all"
and downloads
data for all available age groups.
The data in WID.world can refer to different types of population
(i.e. different statistical units). The argument pop
is a vector of
population codes. They are coded using one-letter codes. Some of the
most common possibilities include:
one-letter code | description | |
i | individuals | |
t | tax units | |
j | equal-split adults (ie. income or wealth divided equally among spouses) | |
If you don't specify any code, it defaults to "all"
and downloads data for all types of population.
Some of the data on WID.world is the result of interpolations (when data is only available for a few years) or extrapolations (when data is not available for the most recent years) that are based on much more limited information that other data points. We include these interpolations/extrapolation by default as a convenience, and also because these values are used to perform regional aggregations. Yet we stress that these estimates, especially at the level of individual countries, can be fragile.
For many purposes, it can be preferable to exclude these data points.
For that, use the option include_extrapolations = FALSE
.
A data.frame
with the following columns:
country
The country or area code.
variable
The variable name, which combine the indicator, the age code and the population code.
percentile
The part of the distribution the value relates to.
year
The year the value relates to.
value
The value of the indicator.
If you specify metadata = TRUE
, the data.frame
also has the
following columns:
countryname
The full name of the country/region.
shortname
A short version of the variable full name in plain english.
shortdes
A description of the type of series.
pop
The population type, in plain english.
age
The age group, in plain english.
source
The source for the data.
method
Methodological notes, if any.
imputation
Type of estimate (when applicable). The imputation
field is a short qualitative description of the type of estimate provided,
which is strongly related to data quality. For technical details, see
the method
field and papers cited in source
.
quality
Data quality (when applicable). The quality
field is a score from 0 to 5 indicating the quality of the data.
Thomas Blanchet
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.