# This is at the moment a private repo
devtools::install_github("dataobservatory-eu/indicator")
library(devtools)
library(dplyr)
library(knitr)
require(kableExtra)
library(indicator)

Working with Eurostat Data

The eurostat package handles interaction with the Eurostat data warehouse. Unfortunately, after many years of stable work, recently we have short outages on the Eurostat server. So the get_eurostat_indicator() should get a short extension that retries the download via eurostat::get_eurostat()several times if it receives a NULL return.

airp <- get_eurostat_indicator (id="ENV_AIR_EMIS")
save_dir <- ifelse ( dir.exists("data-raw"), "data-raw", file.path("..", "data-raw") ) 
save( airp, file = file.path(save_dir, "airp.rda") )

As it can be seen from the message, the eurostat package downloads the latest data from the warehouse and saves it to the tempdir(). The tempdir is a temporary directory of any R environment. It means that a few megabytes of disk space must be available on the instance to run properly.

load_dir <- ifelse(dir.exists("data-raw"), "data-raw", file.path("..", "data-raw") ) 
load(file.path(load_dir, "airp.rda"))

The eurostat id ENV_AIR_EMIS refers to the Air pollutants by source sector (source: EEA) dataset. (See metadata description here) The eurostat package handles the interaction with Eurostat, i.e. downloading, tidying the data and using the correct metadata dictionary for labelling the data, with the important exception of sub-national (regional, metropolitan area) data. That must be managed by an offspring of the eurostat package, namely our package regions.

This actual data product is a database in itself, it contains 2043 unique environmental indicators about air pollution passed on to the Eurostat by the European Environmental Agency. Why so many? These indicators show 15 greenhouse gases and air pollutants, for 35 geographical entities, over 10 years and
dozens of industries. These indicators with a lot of processing could be ideally used for environmental impact assessment together with our iotables package. For example, we can calculate the likely increase/decrease of methane from various economic policy interventions on the agricultural sub-sectors.

Data Tables

The function get_eurostat_indicator() returns a list of three tables, and creates human readable descriptions using the Eurostat dictionaries to figure out what is going on. To keep the data traceable, I'll greatly reduce the indicators in this example by filtering to forest fires in the description.

Metadata

airp$metadata %>%
  select ( all_of(c("description_indicator", "indicator_code", "actual", "missing"))) %>%
  filter ( grepl("forest fires", .data$description_indicator)) %>%
  distinct_all() %>%
  kbl() 
description_indicator indicator_code actual missing
Arsenic as forest fires eurostat_env_air_emis_as_nfr11b 363 0
Cadmium cd forest fires eurostat_env_air_emis_cd_nfr11b 392 0
Chromium cr forest fires eurostat_env_air_emis_cr_nfr11b 363 0
Copper cu forest fires eurostat_env_air_emis_cu_nfr11b 363 0
Mercury hg forest fires eurostat_env_air_emis_hg_nfr11b 392 0
Ammonia forest fires eurostat_env_air_emis_nh3_nfr11b 676 0
Nickel ni forest fires eurostat_env_air_emis_ni_nfr11b 363 0
Non methane volatile organic compounds forest fires eurostat_env_air_emis_nmvoc_nfr11b 676 0
Nitrogen oxides forest fires eurostat_env_air_emis_nox_nfr11b 676 0
Lead pb forest fires eurostat_env_air_emis_pb_nfr11b 392 0
Particulates 10 µm forest fires eurostat_env_air_emis_pm10_nfr11b 624 0
Particulates 2 5 µm forest fires eurostat_env_air_emis_pm2_5_nfr11b 624 0
Selenium se forest fires eurostat_env_air_emis_se_nfr11b 334 0
Sulphur oxides forest fires eurostat_env_air_emis_sox_nfr11b 676 0
Zinc zn forest fires eurostat_env_air_emis_zn_nfr11b 348 0

The entire (filtered) metadata for forest fires:

airp$metadata %>%
  filter ( grepl("forest fires", .data$description_indicator)) %>%
  distinct_all() %>%
  kbl() 
indicator_code title_at_source description_indicator db_source_code frequency data_start data_end last_update_data last_update_data_source last_structure_change actual missing locf nocb interpolate forecast backcast impute recode
eurostat_env_air_emis_as_nfr11b Air pollutants by source sector (source: EEA) Arsenic as forest fires eurostat_env_air_emis A 1990 2018 2021-05-07 2020-11-25 2020-11-25 363 0 0 0 0 0 0 0 0
eurostat_env_air_emis_cd_nfr11b Air pollutants by source sector (source: EEA) Cadmium cd forest fires eurostat_env_air_emis A 1990 2018 2021-05-07 2020-11-25 2020-11-25 392 0 0 0 0 0 0 0 0
eurostat_env_air_emis_cr_nfr11b Air pollutants by source sector (source: EEA) Chromium cr forest fires eurostat_env_air_emis A 1990 2018 2021-05-07 2020-11-25 2020-11-25 363 0 0 0 0 0 0 0 0
eurostat_env_air_emis_cu_nfr11b Air pollutants by source sector (source: EEA) Copper cu forest fires eurostat_env_air_emis A 1990 2018 2021-05-07 2020-11-25 2020-11-25 363 0 0 0 0 0 0 0 0
eurostat_env_air_emis_hg_nfr11b Air pollutants by source sector (source: EEA) Mercury hg forest fires eurostat_env_air_emis A 1990 2018 2021-05-07 2020-11-25 2020-11-25 392 0 0 0 0 0 0 0 0
eurostat_env_air_emis_nh3_nfr11b Air pollutants by source sector (source: EEA) Ammonia forest fires eurostat_env_air_emis A 1990 2018 2021-05-07 2020-11-25 2020-11-25 676 0 0 0 0 0 0 0 0
eurostat_env_air_emis_ni_nfr11b Air pollutants by source sector (source: EEA) Nickel ni forest fires eurostat_env_air_emis A 1990 2018 2021-05-07 2020-11-25 2020-11-25 363 0 0 0 0 0 0 0 0
eurostat_env_air_emis_nmvoc_nfr11b Air pollutants by source sector (source: EEA) Non methane volatile organic compounds forest fires eurostat_env_air_emis A 1990 2018 2021-05-07 2020-11-25 2020-11-25 676 0 0 0 0 0 0 0 0
eurostat_env_air_emis_nox_nfr11b Air pollutants by source sector (source: EEA) Nitrogen oxides forest fires eurostat_env_air_emis A 1990 2018 2021-05-07 2020-11-25 2020-11-25 676 0 0 0 0 0 0 0 0
eurostat_env_air_emis_pb_nfr11b Air pollutants by source sector (source: EEA) Lead pb forest fires eurostat_env_air_emis A 1990 2018 2021-05-07 2020-11-25 2020-11-25 392 0 0 0 0 0 0 0 0
eurostat_env_air_emis_pm10_nfr11b Air pollutants by source sector (source: EEA) Particulates 10 µm forest fires eurostat_env_air_emis A 1990 2018 2021-05-07 2020-11-25 2020-11-25 624 0 0 0 0 0 0 0 0
eurostat_env_air_emis_pm2_5_nfr11b Air pollutants by source sector (source: EEA) Particulates 2 5 µm forest fires eurostat_env_air_emis A 1990 2018 2021-05-07 2020-11-25 2020-11-25 624 0 0 0 0 0 0 0 0
eurostat_env_air_emis_se_nfr11b Air pollutants by source sector (source: EEA) Selenium se forest fires eurostat_env_air_emis A 1990 2018 2021-05-07 2020-11-25 2020-11-25 334 0 0 0 0 0 0 0 0
eurostat_env_air_emis_sox_nfr11b Air pollutants by source sector (source: EEA) Sulphur oxides forest fires eurostat_env_air_emis A 1990 2018 2021-05-07 2020-11-25 2020-11-25 676 0 0 0 0 0 0 0 0
eurostat_env_air_emis_zn_nfr11b Air pollutants by source sector (source: EEA) Zinc zn forest fires eurostat_env_air_emis A 1990 2018 2021-05-07 2020-11-25 2020-11-25 348 0 0 0 0 0 0 0 0

There are no missing variables here, but of course, we could forecast or backcast values per geographical entity, in this case, countries.

Dictionary

The get_eurostat_indicator() function saves the entire dictionary that allows the programmatic description of the data. For example, in the original database, AS abbreviates any pollution related to Arsenic (As), or NFR11B is the activity code of the polluting process Forest fires.

airp$labelling %>%
  filter ( grepl("forest fires", .data$description_indicator)) %>%
  distinct_all() %>%
  kbl ()
db_source_code indicator_code description_indicator variable code description_variable
eurostat_env_air_emis eurostat_env_air_emis_as_nfr11b Arsenic as forest fires airpol AS Arsenic (As)
eurostat_env_air_emis eurostat_env_air_emis_as_nfr11b Arsenic as forest fires src_nfr NFR11B Forest fires
eurostat_env_air_emis eurostat_env_air_emis_cd_nfr11b Cadmium cd forest fires airpol CD Cadmium (Cd)
eurostat_env_air_emis eurostat_env_air_emis_cd_nfr11b Cadmium cd forest fires src_nfr NFR11B Forest fires
eurostat_env_air_emis eurostat_env_air_emis_cr_nfr11b Chromium cr forest fires airpol CR Chromium (Cr)
eurostat_env_air_emis eurostat_env_air_emis_cr_nfr11b Chromium cr forest fires src_nfr NFR11B Forest fires
eurostat_env_air_emis eurostat_env_air_emis_cu_nfr11b Copper cu forest fires airpol CU Copper (Cu)
eurostat_env_air_emis eurostat_env_air_emis_cu_nfr11b Copper cu forest fires src_nfr NFR11B Forest fires
eurostat_env_air_emis eurostat_env_air_emis_hg_nfr11b Mercury hg forest fires airpol HG Mercury (Hg)
eurostat_env_air_emis eurostat_env_air_emis_hg_nfr11b Mercury hg forest fires src_nfr NFR11B Forest fires
eurostat_env_air_emis eurostat_env_air_emis_nh3_nfr11b Ammonia forest fires airpol NH3 Ammonia
eurostat_env_air_emis eurostat_env_air_emis_nh3_nfr11b Ammonia forest fires src_nfr NFR11B Forest fires
eurostat_env_air_emis eurostat_env_air_emis_ni_nfr11b Nickel ni forest fires airpol NI Nickel (Ni)
eurostat_env_air_emis eurostat_env_air_emis_ni_nfr11b Nickel ni forest fires src_nfr NFR11B Forest fires
eurostat_env_air_emis eurostat_env_air_emis_nmvoc_nfr11b Non methane volatile organic compounds forest fires airpol NMVOC Non-methane volatile organic compounds
eurostat_env_air_emis eurostat_env_air_emis_nmvoc_nfr11b Non methane volatile organic compounds forest fires src_nfr NFR11B Forest fires
eurostat_env_air_emis eurostat_env_air_emis_nox_nfr11b Nitrogen oxides forest fires airpol NOX Nitrogen oxides
eurostat_env_air_emis eurostat_env_air_emis_nox_nfr11b Nitrogen oxides forest fires src_nfr NFR11B Forest fires
eurostat_env_air_emis eurostat_env_air_emis_pb_nfr11b Lead pb forest fires airpol PB Lead (Pb)
eurostat_env_air_emis eurostat_env_air_emis_pb_nfr11b Lead pb forest fires src_nfr NFR11B Forest fires
eurostat_env_air_emis eurostat_env_air_emis_pm10_nfr11b Particulates 10 µm forest fires airpol PM10 Particulates < 10µm
eurostat_env_air_emis eurostat_env_air_emis_pm10_nfr11b Particulates 10 µm forest fires src_nfr NFR11B Forest fires
eurostat_env_air_emis eurostat_env_air_emis_pm2_5_nfr11b Particulates 2 5 µm forest fires airpol PM2_5 Particulates < 2.5µm
eurostat_env_air_emis eurostat_env_air_emis_pm2_5_nfr11b Particulates 2 5 µm forest fires src_nfr NFR11B Forest fires
eurostat_env_air_emis eurostat_env_air_emis_se_nfr11b Selenium se forest fires airpol SE Selenium (Se)
eurostat_env_air_emis eurostat_env_air_emis_se_nfr11b Selenium se forest fires src_nfr NFR11B Forest fires
eurostat_env_air_emis eurostat_env_air_emis_sox_nfr11b Sulphur oxides forest fires airpol SOX Sulphur oxides
eurostat_env_air_emis eurostat_env_air_emis_sox_nfr11b Sulphur oxides forest fires src_nfr NFR11B Forest fires
eurostat_env_air_emis eurostat_env_air_emis_zn_nfr11b Zinc zn forest fires airpol ZN Zinc (Zn)
eurostat_env_air_emis eurostat_env_air_emis_zn_nfr11b Zinc zn forest fires src_nfr NFR11B Forest fires

Statistical Indicators

Of course, the most important element of the list returned by The get_eurostat_indicator() is the actual value of the indicator.

select_indicators <- airp$metadata %>%
        filter ( grepl("forest fires", .data$description_indicator)) %>%
        distinct ( .data$indicator_code ) %>% unlist () %>% as.character() 


airp$indicator %>%
  filter ( 
    .data$indicator_code %in% select_indicators 
      ) %>%
  distinct_all() %>%
  head(12) 
#> # A tibble: 12 x 12
#>    indicator_code unit  geo   time       value estimate method  year month   day frequency
#>    <chr>          <chr> <chr> <date>     <dbl> <chr>    <chr>  <int> <int> <int> <chr>    
#>  1 eurostat_env_… T     BE    2018-01-01     0 actual   actual  2018     1     1 A        
#>  2 eurostat_env_… T     BG    2018-01-01     0 actual   actual  2018     1     1 A        
#>  3 eurostat_env_… T     CH    2018-01-01     0 actual   actual  2018     1     1 A        
#>  4 eurostat_env_… T     DE    2018-01-01     0 actual   actual  2018     1     1 A        
#>  5 eurostat_env_… T     FI    2018-01-01     0 actual   actual  2018     1     1 A        
#>  6 eurostat_env_… T     HR    2018-01-01     0 actual   actual  2018     1     1 A        
#>  7 eurostat_env_… T     IE    2018-01-01     0 actual   actual  2018     1     1 A        
#>  8 eurostat_env_… T     IT    2018-01-01     0 actual   actual  2018     1     1 A        
#>  9 eurostat_env_… T     LV    2018-01-01     0 actual   actual  2018     1     1 A        
#> 10 eurostat_env_… T     PL    2018-01-01     0 actual   actual  2018     1     1 A        
#> 11 eurostat_env_… T     RO    2018-01-01     0 actual   actual  2018     1     1 A        
#> 12 eurostat_env_… T     UK    2018-01-01     0 actual   actual  2018     1     1 A        
#> # … with 1 more variable: db_source_code <chr>


dataobservatory-eu/indicator documentation built on Dec. 19, 2021, 8:13 p.m.