The goal of
retroharmonize is to facilitate retrospective (ex-post)
harmonization of data, particularly survey data, in a reproducible
manner. The package provides tools for organizing the metadata,
standardizing the coding of variables, variable names and value labels,
including missing values, and for documenting all transformations, with
the help of comprehensive s3 classes.
You can download the manual in PDF.
The package will be available for install via after review on CRAN:
The development version from GitHub can be installed with:
# install.packages("devtools") devtools::install_github("antaldaniel/retroharmonize")
The aim of
retroharmonize is to provide tools for reproducible
retrospective (ex-post) harmonization of datasets that contain variables
measuring the same concepts but coded in different ways. Ex-post data
harmonization enables better use of existing data and creates new
research opportunities. For example, harmonizing data from different
countries enables cross-national comparisons, while merging data from
different time points makes it possible to track changes over time.
Retrospective data harmonization is associated with challenges including
conceptual issues with establishing equivalence and comparability,
practical complications of having to standardize the naming and coding
of variables, technical difficulties with merging data stored in
different formats, and the need to document a large number of data
retroharmonize package assists with the latter
three components, freeing up the capacity of researchers to focus on the
retroharmonize package proposes a reproducible
workflow, including a new class for storing data together with the
harmonized and original metadata, as well as functions for importing
data from different formats, harmonizing data and metadata, documenting
the harmonization process, and converting between data types. See
for an overview of the functionalities.
labelled_spss_survey() class is an extension of haven’s
class. It not
only preserves variable and value labels and the user-defined missing
range, but also gives an identifier, for example, the filename or the
wave number, to the vector. Additionally, it enables the preservation –
as metadata attributes – of the original variable names, labels, and
value codes and labels, from the source data, in addition to the
harmonized variable names, labels, and value codes and labels. This way,
the harmonized data also contain the pre-harmonization record. The
stored original metadata can be used for validation and documentation
The vignette Working With The labelled_spss_survey
provides more information about the
In Harmonize Value
we discuss the characteristics of the
labelled_spss_survey() class and
demonstrates the problems that using this class solves.
We also provide two extensive case studies illustrating how the
retroharmonize package can be used for ex-post harmonization of data
from cross-national surveys on the example of the
The creators of
retroharmonize are not affiliated with either
Afrobarometer, Eurobarometer, or the organizations that designs,
produces or archives their surveys.
Please note that the
retroharmonize project is released with a
Contributor Code of
By contributing to this project, you agree to abide by its terms.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.