This package incorporates the replication dataset for Pemstein, Meserve, and Melton (2010) in its replication data (Pemstein, Meserve, and Melton 2013). This replication data contains the measures of democracy used in the construction of the official Unified Democracy Scores. These measures of democracy are labeled _pmm in the democracy dataset in this package. However, some of these measures (Freedom House, Mainwaring et al, DD/PACL, Polity IV, Polyarchy, PRC, Vanhanen) differ from the original sources (the official data from Freedom House, etc.) in a small number of cases, for a number of reasons having to do with data revisions after 2008 and transcription errors. This vignette documents these differences.

Freedom House

comparison <- democracy %>% 
  prepare_data() %>% 
  filter(year <= 2008, year >=1946)

There are in total r nrow(comparison %>% filter((! & | ( & ! | (pmm_fh != fh_total_reversed))) differences between PMM's replication data and the current Freedom House data (Freedom House 2017). These differences seem to be due to data revisions since 2010 in the Freedom House dataset, plus the inclusion of non-state territories in this dataset, and the treatment of 1981, which is conventionally taken to be "missing data" in the Freedom House series but which appears to be interpolated in PMM's replication dataset.

kable(comparison %>% 
        filter((! & | 
                 ( & ! | 
                 (pmm_fh != fh_total_reversed)) %>%
        group_by(extended_country_name, pmm_fh, fh_total_reversed) %>%
        summarise(years = paste(year,collapse=", "), n = n()) %>%
      caption = "Differences between PMM's FH values and current FH data",
      col.names = c("Country name","Freedom House value in PMM",
                    "Latest Freedom House data","years affected","n"))

The Mainwaring et al. dataset

PMM's replication data is missing r nrow(comparison %>% filter((! & | ( & ! | (pmm_mainwaring != mainwaring))) the data points in the original data by Mainwaring et al (Mainwaring, Brinks, and Perez Linan 2008), but the original data is not missing any of their data points, and there are no differences between the data points wherever both the original and the replication data have values.

kable(comparison %>% 
        filter((! & | 
                 ( & ! | 
                 (pmm_mainwaring != mainwaring)) %>%
        group_by(extended_country_name, pmm_mainwaring, mainwaring) %>%
        summarise(min = min(year), max= max(year), n = n()),
      caption = "Differences between PMM's Mainwaring et al values and original Mainwaring et al data",
      col.names = c("Country name","Mainwaring et al value in PMM",
                    "Mainwaring et al value in original dataset","min year", "max year","n"))

The DD/PACL/ACLP dataset

r nrow(comparison %>% filter((! & | ( & ! | (pmm_pacl != pacl))) country-years in the original PACL/DD dataset (Cheibub, Gandhi, and Vreeland 2010) are missing from PMM's replication dataset.

kable(comparison %>% 
        filter((! & | 
                 ( & ! | 
                 (pmm_pacl != pacl)) %>%
        group_by(extended_country_name, pmm_pacl, pacl) %>%
        summarise(min = min(year), max= max(year), n = n()),
      caption = "Differences between PMM's PACL values and original PACL data",
      col.names = c("Country name","PACL value in PMM",
                    "Original PACL data","min year", "max year","n"))

The Polity IV dataset

There are r nrow(comparison %>% filter((! & | ( & ! | (pmm_polity != polity2))) country-years where PMM's replication data differ from the latest version of Polity IV, either because they have no data, or because they have a different value. These differences seem to be due to minor data revisions since 2010 in the Polity IV dataset.

kable(comparison %>% 
        filter((! & | 
                 ( & ! | 
                 (pmm_polity != polity2)) %>%
        mutate_at(vars(matches("polity")), funs(. - 11)) %>%
        group_by(extended_country_name, pmm_polity, polity2) %>%
        summarise(min = min(year), max= max(year), n = n()),
      caption = "Differences between PMM's Polity2 values and latest Polity IV data",
      col.names = c("Country name","Polity2 value in PMM",
                    "Latest Polity IV data","min year", "max year","n"))

The Polyarchy dataset

r nrow(comparison %>% filter((! & | ( & ! | (pmm_polyarchy != polyarchy_original_polyarchy))) country-years differ between PMM's replication data and the original Polyarchy dataset (Coppedge and Reinicke 1991). These seem to be due to simple transcription errors.

kable(comparison %>% 
        filter((! & | 
                 ( & ! | 
                 (pmm_polyarchy != polyarchy_original_polyarchy)) %>%
                 polyarchy_original_polyarchy) %>%
        summarise(min = min(year), max= max(year), n = n()),
      caption = "Differences between PMM's Polyarchy values and original Polyarchy data",
      col.names = c("Country name","Polyarchy value in PMM",
                    "Original Polyarchy data","min year", "max year","n"))

The Political Regime Change (PRC) dataset

The PRC dataset (Gasiorowski 1996, revised in Reich 2002) has more than one value per country-year for some country-years due to transitions between regimes; and these transitions are not consistently treated in PMM's replication dataset (sometimes the value for the beginning of the year is used, sometimes the value for the end of the year, and sometimes the value for the middle of the year). This results in r nrow(comparison %>% mutate(prc = ifelse(prc == 1, prc, prc + 1)) %>% filter((! & | ( & ! | (pmm_prc != prc))) differences between the datasets. In this package, I use the last value in a given year for the regime in the PRC dataset, and code "transitions" (prc = 2) as missing values (NA), which results in the following changes:

kable(comparison %>% 
        mutate(prc = ifelse(prc == 1, prc, prc + 1)) %>%
        filter((! & | 
                 ( & ! | 
                 (pmm_prc != prc)) %>%
        group_by(extended_country_name, pmm_prc, prc) %>%
        summarise(years = paste(year,collapse=", "), n = n()),
      caption = "Differences between PMM's PRC values and original PRC data",
      col.names = c("Country name","PRC value in PMM",
                    "Original PRC data","years affected","n"))

Vanhanen dataset

There are r nrow(comparison %>% filter((! & | ( & ! | (pmm_vanhanen != vanhanen_democratization))) missing values in PMM's data compared to the original Vanhanen dataset (Vanhanen 2012):

comparison2 <- democracy %>% 
  select(extended_country_name, GWn, year, vanhanen_democratization, pmm_vanhanen) %>% 
  filter(year <= 2008, year >=1946)

kable(comparison2 %>% 
        filter((! & | 
                 ( & ! | 
                 (pmm_vanhanen != vanhanen_democratization)) %>%
        group_by(extended_country_name,pmm_vanhanen,vanhanen_democratization) %>%
        summarise(years = paste(year,collapse=", "), n = n()),
      caption = "Differences between PMM's Vanhanen values and original Vanhanen data",
      col.names = c("Country name","Vanhanen value in PMM",
                    "Original Vanhanen data","years affected","n"))


