In antaldaniel/eurobarometer: Tidy Eurobarometer Survey Microdata For Retrospective Harmonization

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(eurobarometer)
library(dplyr)
library(knitr)
library(stringr)

The eurobarometer package relies on the survey class system of retroharmonize. You do not have to load the entire retroharmonize package - whatever is needed to make eurobarometer work is imported and modified as needed.

ZA6863 <- read_rds(
  system.file("examples", "ZA6863.rds", package = "eurobarometer")
)
ZA7576 <- read_rds(
  system.file("examples", "ZA7576.rds", package = "eurobarometer")
)

Metadata

The metadata analysis is a first step to help both variable name and value label harmonization.

ZA6863_metadata <- gesis_metadata_create(ZA6863)
ZA7576_metadata <- gesis_metadata_create(ZA7576)

Variables of base types numeric and character can be safely concatenated. The labelled, mainly categorical variables require special attention: their valid range and missing range must be harmonized before binding the two tables together.

ZA6863_items <- ZA6863_metadata %>%
  filter (
    class_orig %in% c("character", "numeric") | 
    str_sub(var_name_suggested, 1,5) == 'trust' ) %>%
  filter ( var_name_suggested != 'not_given' ) %>%
  pull (var_name_suggested)

ZA7576_items <- ZA7576_metadata %>%
  filter (
    class_orig %in% c("character", "numeric") | 
    str_sub(var_name_suggested, 1,5) == 'trust' ) %>%
    filter ( var_name_suggested != 'not_given' ) %>%
    pull (var_name_suggested)

In this case, the var_label_suggest() function worked perfectly, so we can approve the suggestions of gesis_metadata_create().

Let's select the variables with identical names from the two surveys:

hZA6863 <- ZA6863 %>%
  stats::setNames ( nm = ZA6863_metadata$var_name_suggested ) %>%
  select ( all_of(intersect(ZA6863_items, ZA7576_items)))

hZA7576 <- ZA7576 %>%
  stats::setNames ( nm = ZA7576_metadata$var_name_suggested ) %>%
  select ( all_of(intersect(ZA6863_items, ZA7576_items)))

And have a look at their value labelling: [no idea why are this not identical.]

ZA6863_trust <- ZA6863_metadata %>%
  filter ( str_sub(var_name_suggested, 1,5) == 'trust' ) %>%
  select ( labels, na_labels ) %>%
  tidyr::unnest( cols = c(labels, na_labels) ) %>%
  distinct_all()

ZA6863_trust$labels[1]
ZA6863_trust$labels[2]

ZA7576_trust <- ZA7576_metadata %>%
  filter ( str_sub(var_name_suggested, 1,5) == 'trust' ) %>%
  select ( labels, na_labels ) %>%
  tidyr::unnest( cols = c(labels, na_labels) ) %>%
  distinct_all()

ZA7576_trust$labels[1]
ZA7576_trust$labels[2]

Harmonize the value labels

The retroharmonize::harmonize_values() is a prototype of the harmonization function. It should be adjusted to survey and question-block specific idiosyncrasies. This should be the work of various vocabulary tables, but the prototype can be made work with inputting the harmonization regex either as a list or as a data frame.

Because we would like to have the same harmonization for a question block, in this case we adopt the prototype with a regex. The retroharmonize::harmonize_values() function will normalize the labels, so you do not have to deal with capitalization and upper case versions. If you want to understand better the harmonization procedure, please refer to the Harmonize Value Labels vignette of the retroharmonize package.

With a better imputing system, this could be automated to a high level, probably harmonizing all trend variables at the same time. The harmonize_eurobaromter should be something that deals with this.

harmonize_trust <- function(x) {
   retroharmonize::harmonize_values(
  x = x,
  harmonize_label = NULL,
  harmonize_labels = (
    list (
     from = c("^tend\\sto|^trust", "^tend\\snot|not\\strust", "^dk|^don", "^inap"), 
    to = c("trust", "not_trust", "do_not_know", "inap"),
    numeric_values = c(1,0,99997, 99999))
    ),
  na_values = c(do_not_know = 99997, declined = 99998, inap = 99999),
  na_range = NULL,
  id = "survey_id",
  name_orig = NULL)
}

Choosing the first trust vector, we can see that the harmonization records all metadata for reproducibility.

harmonize_trust (hZA6863$trust_army)

The coding appears very similar, so we use the same helper function for the same question in the other survey:

harmonize_trust (hZA7576$trust_army)

trust_in_army <- retroharmonize::concatenate(
  x = harmonize_trust ( hZA6863$trust_army), 
  y = harmonize_trust ( hZA7576$trust_army)
  )
trust_in_army

The attributes are complex, because they leave open reverting to historical coding, and for a choice of categorical or numeric representation in R.

summary ( as_factor(trust_in_army))
summary ( as_numeric(trust_in_army))

Let's repeat the same harmonization for all trust variables.

hZA7576 <- hZA7576 %>%
  mutate_at (vars (starts_with("trust")), harmonize_trust )
hZA6863 <- hZA6863 %>%
  mutate_at (vars (starts_with("trust")), harmonize_trust )

hZA6863 %>%
  select ( all_of(c("trust_army", "trust_european_union")))

Given that the other selected variables have identical (harmonized) names and they are of base type numeric or character, after harmonizing the trust labels and na_values, we can bind the two panels with vectrs::vec_rbind() or dplyr::bind_rows(). Unfortunately, the generic c() method cannot be implemented to work with this type.

panel <- vctrs::vec_rbind (
  hZA6863, hZA6863 
)

The panel is created, and it is open for exporting to other statistical software, or further analysis in R. While some basic arithmetic methods are implemented for the labelled_spss_survey class of the retroharmonize package, for using all R statistical packages, the analyst has to chose a base R type that is compatible with them. Since the trust variables are categorical variables, they can be re-casted with the as_factor() or as_numeric() methods. Again, the base R as.factor() or as.numeric() will give a legible, but not correct representation.

The factor representation presents the user-defined missing values as categories:

panel %>%
  mutate_at (vars (starts_with("trust")), as_factor ) %>%
  summary()

And let's compare this with the numeric representation, where the user-defined missing values are treated as missing:

panel %>%
  mutate_at (vars (starts_with("trust")), as_numeric ) %>%
  summary()

Documentation

trust_in_army_doc <- retroharmonize::document_survey_item(
  trust_in_army)

trust_in_army_doc$code_table %>% kable ()

trust_in_army_doc$history_var_label

antaldaniel/eurobarometer documentation built on Aug. 31, 2020, 10:57 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

antaldaniel/eurobarometer
Tidy Eurobarometer Survey Microdata For Retrospective Harmonization

In antaldaniel/eurobarometer: Tidy Eurobarometer Survey Microdata For Retrospective Harmonization

Metadata

Harmonize the value labels

Documentation

R Package Documentation

Browse R Packages

We want your feedback!

antaldaniel/eurobarometer Tidy Eurobarometer Survey Microdata For Retrospective Harmonization

In antaldaniel/eurobarometer: Tidy Eurobarometer Survey Microdata For Retrospective Harmonization

Metadata

Harmonize the value labels

Documentation

R Package Documentation

Browse R Packages

We want your feedback!

antaldaniel/eurobarometer
Tidy Eurobarometer Survey Microdata For Retrospective Harmonization