Recoding & Relabelling"

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(regions)
library(dplyr)
library(tidyr)

Eurostat offers so-called correspondence tables to follow boundary changes, recoding and relabelling for all NUTS changes since the formalization of the NUTS typology. Unfortunately, these Excel tables do not conform with the requirements of tidy data, and their vocabulary for is not standardized, either. For example, recoding changes are often labelled as recoding, recoding and renaming, code change, Code change, etc.

The data-raw library contains these Excel tables and very long data wrangling code that unifies the relevant vocabulary of these Excel files and brings the tables into a single, tidy format , starting with the definition NUTS1999. The resulting data file nuts_changes is included in the regions package. It already contains the changes that will come into force in 2021.

Let's review a few changes.

data(nuts_changes)

nuts_changes %>%
  mutate ( geo_16 = .data$code_2016, 
           geo_13 = .data$code_2013 ) %>%
  filter ( code_2016 %in% c("FRB", "HU11") | 
             code_2013 %in% c("FR7", "HU10", "FR24")) %>%
  select ( all_of(c("typology", "geo_16", "geo_13", "start_year",
           "code_2013", "change_2013",
           "code_2016", "change_2016")) 
           ) %>%
  pivot_longer ( cols = starts_with("code"), 
                 names_to = 'definition', 
                 values_to = 'code') %>%
  pivot_longer ( cols = starts_with("change"), 
                 names_to = 'change', 
                 values_to = 'description')  %>%
  filter (!is.na(.data$description), 
          !is.na(.data$code)) %>%
  select ( -.data$change ) %>%
  knitr::kable ()

You will not find the geo identifier FRB in any statistical data that was released before France changes its administrative boundaries and the NUTS2016 boundary definition came into force. However, as the description says, you may find historical data elsewhere, in a historical NUTS2-level product for the FRB CENTRE — VAL DE LOIRE NUTS1 region, because it is identical to the earlier NUTS2 level region FR24, i.e. Central France, which was known as Centre for many years before the transition to NUTS2016. The size and importance of this territorial unit is more similar to NUTS1 than NUTS2 units.

Because FRB contains only one FRB0, the earlier FR24, it is technically identified as a NUTS2-level region, too. You find the same data in the NUTS2 typology. With statistical products on NUTS2 level, you can simply recode historical FR24 data to FRB0, since the aggregation level and the boundaries are not changed. Furthermore, you can project this data to any NUTS1 level panel either under the earlier FR2 NUTS1 label, if you use the old definition, or the new FRB label, if you use the current NUTS2016 typology.

Let's see a hypothetical data frame with random variables. (Usually a data frame has no so many issues, so a more detailed example can be constructed this way.)

example_df <- data.frame ( 
  geo  =  c("FR", "DEE32", "UKI3" ,
            "HU12", "DED", 
            "FRK"), 
  values = runif(6, 0, 100 ),
  stringsAsFactors = FALSE )

recode_nuts(dat = example_df, 
            nuts_year = 2013) %>%
  select ( geo, values, code_2013) %>%
  knitr::kable()

In this hypothetical example we are creating backward compatibility with the NUTS2013 definition. There are three type of observations:

recode_nuts(example_df, nuts_year = 2013) %>%
  select ( all_of(c("geo", "values", "typology_change", "code_2013")) ) %>%
  knitr::kable()

The first three observations are comparable with a NUTS2013 dataset. The fourth observation is comparable, too, but when joining with a NUTS2013 dataset or map, it is likely that FRK needs to be re-coded to FR7.

The following data can be joined with a NUTS2013 dataset or map:

recode_nuts(example_df, nuts_year = 2013) %>%
  select ( .data$code_2013, .data$values, .data$typology_change ) %>%
  rename ( geo = .data$code_2013 ) %>% 
  filter ( !is.na(.data$geo) ) %>%
  knitr::kable()

And re-assuringly these data will be compatible with the next NUTS typology, too!

recode_nuts(example_df, nuts_year = 2021) %>%
  select ( .data$code_2021, .data$values, .data$typology_change ) %>%
  rename ( geo = .data$code_2021 ) %>% 
  filter ( !is.na(.data$geo) ) %>%
  knitr::kable()

What about HU12?

data(nuts_changes) 
nuts_changes %>% 
  select( .data$code_2016, .data$geo_name_2016, .data$change_2016) %>%
  filter( code_2016 == "HU12") %>%
  filter( complete.cases(.) ) %>%
  knitr::kable()

The description in the correspondence tables clarifies that in fact historical data may be assembled for HU12 (Pest county.)

That will be the topic of a later vignette on aggregation and re-aggregation.

Citations and related work

Citing the data sources

Eurostat data: cite Eurostat.

Administrative boundaries: cite EuroGeographics.

Citing the regions R package

For main developer and contributors, see the package homepage.

This work can be freely used, modified and distributed under the GPL-3 license:

citation("regions")

Contact

For contact information, see the package homepage.



Try the regions package in your browser

Any scripts or data that you put into this service are public.

regions documentation built on June 21, 2021, 5:06 p.m.