get_eurostat | R Documentation |
Download data sets from Eurostat https://ec.europa.eu/eurostat
get_eurostat( id, time_format = "date", filters = "none", type = "code", select_time = NULL, cache = TRUE, update_cache = FALSE, cache_dir = NULL, compress_file = TRUE, stringsAsFactors = FALSE, keepFlags = FALSE, legacy_bulk_download = TRUE, ... )
id |
A code name for the dataset of interest.
See |
time_format |
a string giving a type of the conversion of the time
column from the eurostat format. A "date" (default) converts to
a |
filters |
a "none" (default) to get a whole dataset or a named list of
filters to get just part of the table. Names of list objects are
Eurostat variable codes and values are vectors of observation codes.
If |
type |
A type of variables, "code" (default) or "label". |
select_time |
a character symbol for a time frequency or NULL,
which is used by default as most datasets have just one time
frequency. For datasets with multiple time
frequencies, select one or more of the desired frequencies with:
"Y" (or "A") = annual, "S" = semi-annual / semester, "Q" = quarterly,
"M" = monthly, "W" = weekly. For all frequencies in same data
frame |
cache |
a logical whether to do caching. Default is |
update_cache |
a logical whether to update cache. Can be set also with options(eurostat_update = TRUE) |
cache_dir |
a
path to a cache directory. The directory must exist.
The |
compress_file |
a logical whether to compress the
RDS-file in caching. Default is |
stringsAsFactors |
if |
keepFlags |
a logical whether the flags (e.g. "confidential",
"provisional") should be kept in a separate column or if they
can be removed. Default is |
legacy_bulk_download |
a logical, whether to use the new dissemination API to
download TSV files instead of the old Bulk Download facilities.
Default is |
... |
Arguments passed on to
|
Data sets are downloaded from
the Eurostat bulk download facility or from The Eurostat Web Services
JSON API.
If only the table id
is given, the whole table is downloaded from the
bulk download facility. If also filters
are defined the JSON API is
used.
The bulk download facility is the fastest method to download whole datasets.
It is also often the only way as the JSON API has limitation of maximum
50 sub-indicators at time and whole datasets usually exceeds that. Also,
it seems that multi frequency datasets can only be retrieved via
bulk download facility and the select_time
is not available for
JSON API method.
If your connection is thru a proxy, you probably have to set proxy parameters
to use JSON API, see get_eurostat_json()
.
By default datasets from the bulk download facility are cached as they are
often rather large. Caching is not (currently) possible for datasets from
JSON API.
Cache files are stored in a temporary directory by default or in
a named directory (See set_eurostat_cache_dir()
).
The cache can be emptied with clean_eurostat_cache()
.
The id
, a code, for the dataset can be searched with
the search_eurostat()
or from the Eurostat database
https://ec.europa.eu/eurostat/data/database. The Eurostat
database gives codes in the Data Navigation Tree after every dataset
in parenthesis.
a tibble.
One column for each dimension in the data, the time column for a time
dimension and the values column for numerical values. Eurostat data does
not include all missing values and a treatment of missing values depend
on source. In bulk download facility missing values are dropped if all
dimensions are missing on particular time. In JSON API missing values are
dropped only if all dimensions are missing on all times. The data from
bulk download facility can be completed for example with tidyr::complete()
.
Przemyslaw Biecek, Leo Lahti, Janne Huovari and Markus Kainu
See citation("eurostat")
:
# # Kindly cite the eurostat R package as follows: # # (C) Leo Lahti, Janne Huovari, Markus Kainu, Przemyslaw Biecek. # Retrieval and analysis of Eurostat open data with the eurostat # package. R Journal 9(1):385-392, 2017. doi: 10.32614/RJ-2017-019 # Package URL: http://ropengov.github.io/eurostat Article URL: # https://journal.r-project.org/archive/2017/RJ-2017-019/index.html # # A BibTeX entry for LaTeX users is # # @Article{, # title = {Retrieval and Analysis of Eurostat Open Data with the eurostat Package}, # author = {Leo Lahti and Janne Huovari and Markus Kainu and Przemyslaw Biecek}, # journal = {The R Journal}, # volume = {9}, # number = {1}, # pages = {385--392}, # year = {2017}, # doi = {10.32614/RJ-2017-019}, # url = {https://doi.org/10.32614/RJ-2017-019}, # }
When citing data, please indicate that the data source is Eurostat. If the re-use of data involves modification to the data or text, state this clearly. For more detailed information and exceptions regarding commercial use, see Eurostat policy on copyright and free re-use of data.
search_eurostat()
, label_eurostat()
## Not run: k <- get_eurostat("nama_10_lp_ulc") k <- get_eurostat("nama_10_lp_ulc", time_format = "num") k <- get_eurostat("nama_10_lp_ulc", update_cache = TRUE) k <- get_eurostat("nama_10_lp_ulc", cache_dir = file.path(tempdir(), "r_cache") ) options(eurostat_update = TRUE) k <- get_eurostat("nama_10_lp_ulc") options(eurostat_update = FALSE) set_eurostat_cache_dir(file.path(tempdir(), "r_cache2")) k <- get_eurostat("nama_10_lp_ulc") k <- get_eurostat("nama_10_lp_ulc", cache = FALSE) k <- get_eurostat("avia_gonc", select_time = "Y", cache = FALSE) dd <- get_eurostat("nama_10_gdp", filters = list( geo = "FI", na_item = "B1GQ", unit = "CLV_I10" ) ) # A dataset with multiple time series in one dd2 <- get_eurostat("AVIA_GOR_ME", select_time = c("A", "M", "Q"), time_format = "date_last", legacy_bulk_download = FALSE ) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.