This rOpenGov R package provides tools to access Eurostat database, which you can also browse on-line for the data sets and documentation. For contact information and source code, see the package website.
# Global options library(knitr) opts_chunk$set(fig.path="fig/")
Release version (CRAN):
Development version (Github):
Overall, the eurostat package includes the following functions:
cat(paste0(library(help = "eurostat")$info[], collapse = "\n"))
get_eurostat_toc() downloads a table of contents of eurostat datasets. The values in column 'code' should be used to download a selected dataset.
# Load the package library(eurostat) library(rvest) # Get Eurostat data listing toc <- get_eurostat_toc() # Check the first items library(knitr) kable(head(toc))
search_eurostat() you can search the table of contents for particular patterns, e.g. all datasets related to passenger transport. The kable function to produces nice markdown output. Note that with the
type argument of this function you could restrict the search to for instance datasets or tables.
# info about passengers kable(head(search_eurostat("passenger transport")))
Codes for the dataset can be searched also from the Eurostat database. The Eurostat database gives codes in the Data Navigation Tree after every dataset in parenthesis.
The package supports two of the Eurostats download methods: the bulk download facility and the Web Services' JSON API. The bulk download facility is the fastest method to download whole datasets. It is also often the only way as the JSON API has limitation of maximum 50 sub-indicators at a time and whole datasets usually exceeds that. To download only a small section of the dataset the JSON API is faster, as it allows to make a data selection before downloading.
A user does not usually have to bother with methods, as both are used via main
get_eurostat(). If only the table id is given, the whole table is
downloaded from the bulk download facility. If also filters are defined
the JSON API is used.
Here an example of indicator 'Modal split of passenger transport'. This is the percentage share of each mode of transport in total inland transport, expressed in passenger-kilometres (pkm) based on transport by passenger cars, buses and coaches, and trains. All data should be based on movements on national territory, regardless of the nationality of the vehicle. However, the data collection is not harmonized at the EU level.
Pick and print the id of the data set to download:
# For the original data, see # http://ec.europa.eu/eurostat/tgm/table.do?tab=table&init=1&plugin=1&language=en&pcode=tsdtr210 id <- search_eurostat("Modal split of passenger transport", type = "table")$code print(id)
Get the whole corresponding table. As the table is annual data, it is more convient to use a numeric time variable than use the default date format:
dat <- get_eurostat(id, time_format = "num")
Investigate the structure of the downloaded data set:
Or you can get only a part of the dataset by defining
filters argument. It
should be named list, where names corresponds to variable names (lower case) and
values are vectors of codes corresponding desidered series (upper case). For
time variable, in addition to a
time, also a
lastTimePeriod can be used.
dat2 <- get_eurostat(id, filters = list(geo = c("EU28", "FI"), lastTimePeriod=1), time_format = "num") kable(dat2)
By default variables are returned as Eurostat codes, but to get human-readable
labels instead, use a
type = "label" argument.
datl2 <- get_eurostat(id, filters = list(geo = c("EU28", "FI"), lastTimePeriod = 1), type = "label", time_format = "num") kable(head(datl2))
Eurostat codes in the downloaded data set can be replaced with
human-readable labels from the Eurostat dictionaries with the
datl <- label_eurostat(dat) kable(head(datl))
label_eurostat() allows conversion of individual variable
vectors or variable names as well.
Vehicle information has 3 levels. You can check them now with:
To facilitate smooth visualization of standard European geographic areas, the package provides ready-made lists of the country codes used in the eurostat database for EFTA (efta_countries), Euro area (ea_countries), EU (eu_countries) and EU candidate countries (eu_candidate_countries). These can be used to select specific groups of countries for closer investigation. For conversions with other standard country coding systems, see the countrycode R package. To retrieve the country code list for EFTA, for instance, use:
dat_eu12 <- subset(datl, geo == "European Union (28 countries)" & time == 2012) kable(dat_eu12, row.names = FALSE)
Reshaping the data is best done with
library("tidyr") dat_eu_0012 <- subset(dat, geo == "EU28" & time %in% 2000:2012) dat_eu_0012_wide <- spread(dat_eu_0012, vehicle, values) kable(subset(dat_eu_0012_wide, select = -geo), row.names = FALSE)
dat_trains <- subset(datl, geo %in% c("Austria", "Belgium", "Finland", "Sweden") & time %in% 2000:2012 & vehicle == "Trains") dat_trains_wide <- spread(dat_trains, geo, values) kable(subset(dat_trains_wide, select = -vehicle), row.names = FALSE)
Visualizing train passenger data with
library(ggplot2) p <- ggplot(dat_trains, aes(x = time, y = values, colour = geo)) p <- p + geom_line() print(p)
Triangle plot is handy for visualizing data sets with three variables.
library(tidyr) library(plotrix) library(eurostat) library(dplyr) library(tidyr) # All sources of renewable energy are to be grouped into three sets dict <- c("Solid biofuels (excluding charcoal)" = "Biofuels", "Biogasoline" = "Biofuels", "Other liquid biofuels" = "Biofuels", "Biodiesels" = "Biofuels", "Biogas" = "Biofuels", "Hydro power" = "Hydro power", "Tide, Wave and Ocean" = "Hydro power", "Solar thermal" = "Wind, solar, waste and Other", "Geothermal Energy" = "Wind, solar, waste and Other", "Solar photovoltaic" = "Wind, solar, waste and Other", "Municipal waste (renewable)" = "Wind, solar, waste and Other", "Wind power" = "Wind, solar, waste and Other", "Bio jet kerosene" = "Wind, solar, waste and Other") # Some cleaning of the data is required energy3 <- get_eurostat("ten00081") %>% label_eurostat(dat) %>% filter(time == "2013-01-01", product != "Renewable energies") %>% mutate(nproduct = dict[as.character(product)], # just three categories geo = gsub(geo, pattern=" \\(.*", replacement="")) %>% select(nproduct, geo, values) %>% group_by(nproduct, geo) %>% summarise(svalue = sum(values)) %>% group_by(geo) %>% mutate(tvalue = sum(svalue), svalue = svalue/sum(svalue)) %>% filter(tvalue > 1000) %>% # only large countries spread(nproduct, svalue) # Triangle plot par(cex=0.75, mar=c(0,0,0,0)) positions <- plotrix::triax.plot(as.matrix(energy3[, c(3,5,4)]), show.grid = TRUE, label.points= FALSE, point.labels = energy3$geo, col.axis="gray50", col.grid="gray90", pch = 19, cex.axis=0.8, cex.ticks=0.7, col="grey50") # Larger labels ind <- which(energy3$geo %in% c("Norway", "Iceland","Denmark","Estonia", "Turkey", "Italy", "Finland")) df <- data.frame(positions$xypos, geo = energy3$geo) points(df$x[ind], df$y[ind], cex=2, col="red", pch=19) text(df$x[ind], df$y[ind], df$geo[ind], adj = c(0.5,-1), cex=1.5)
The mapping examples below use
library(dplyr) library(eurostat) library(tmap) # Load example data set data("tgs00026") # Can be retrieved from the eurostat service with: # tgs00026 <- get_eurostat("tgs00026", time_format = "raw") # Data from Eurostat sp_data <- tgs00026 %>% # subset to have only a single row per geo dplyr::filter(time == 2010, nchar(as.character(geo)) == 4) %>% # categorise dplyr::mutate(income = cut_to_classes(values, n = 5)) %>% # merge with geodata merge_eurostat_geodata(data = ., geocolumn = "geo",resolution = "60", output_class = "spdf", all_regions = TRUE)
Load example data (map)
Construct the map
map1 <- tmap::tm_shape(Europe) + tmap::tm_fill("lightgrey") + tmap::tm_shape(sp_data) + tmap::tm_grid() + tmap::tm_polygons("income", title = "Disposable household\nincomes in 2010", palette = "Oranges") + tmap::tm_format_Europe()
Interactive maps can be generated as well
# Interactive tmap_mode("view") map1 # Set the mode back to normal plotting tmap_mode("plot") print(map1)
library(eurostat) library(dplyr) library(ggplot2) library(RColorBrewer) # Downloading and manipulating the tabular data sp_data <- tgs00026 %>% # subsetting to year 2014 and NUTS-3 level dplyr::filter(time == 2014, nchar(as.character(geo)) == 4, grepl("PL",geo)) %>% # label the single geo column mutate(label = paste0(label_eurostat(.)[["geo"]], "\n", values, "€"), income = cut_to_classes(values)) %>% # merge with geodata merge_eurostat_geodata(data=.,geocolumn="geo",resolution = "01", all_regions = FALSE, output_class="spdf") # plot map map2 <- tm_shape(Europe) + tm_fill("lightgrey") + tm_shape(sp_data, is.master = TRUE) + tm_polygons("income", title = "Disposable household incomes in 2014", palette = "Oranges", border.col = "white") + tm_text("label", just = "center") + tm_scale_bar() + tm_format_Europe(legend.outside = TRUE, attr.outside = TRUE) map2
library(sp) library(eurostat) library(dplyr) dat <- tgs00026 %>% # subsetting to year 2014 and NUTS-3 level dplyr::filter(time == 2014, nchar(as.character(geo)) == 4) %>% # classifying the values the variable dplyr::mutate(cat = cut_to_classes(values)) %>% # merge Eurostat data with geodata from Cisco merge_eurostat_geodata(data = .,geocolumn = "geo",resolution = "10", output_class = "spdf", all_regions = FALSE) # plot map sp::spplot(obj = dat, "cat", main = "Disposable household income", xlim = c(-22,34), ylim = c(35,70), col.regions = c("dim grey", brewer.pal(n = 5, name = "Oranges")), col = "white", usePolypath = FALSE)
Eurostat data is available also in the SDMX format. The eurostat R package does not provide custom tools for this but the generic rsdmx R package can be used to access data in that format when necessary:
library(rsdmx) # Data set URL url <- "http://ec.europa.eu/eurostat/SDMX/diss-web/rest/data/cdh_e_fos/..PC.FOS1.BE/?startperiod=2005&endPeriod=2011" # Read the data from eurostat d <- readSDMX(url) # Convert to data frame and show the first entries df <- as.data.frame(d) kable(head(df))
For further examples, see the package homepage.
Eurostat data: cite Eurostat.
Administrative boundaries: cite EuroGeographics
For main developers and contributors, see the package homepage.
This work can be freely used, modified and distributed under the BSD-2-clause (modified FreeBSD) license:
The independent reurostat package develops related Eurostat tools but seems to be in an experimental stage at the time of writing this tutorial.
The more generic quandl, datamart, rsdmx, and pdfetch packages may provide access to some versions of eurostat data but these packages are more generic and hence, in contrast to the eurostat R package, lack tools that are specifically customized to facilitate eurostat analysis.
For contact information, see the package homepage.
This tutorial was created with
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.