In svmiller/peacesciencer: Tools and Data for Quantitative Peace Science Research

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(tidyverse)
library(peacesciencer)

This is a running list of various parlor tricks that you can do with the data and functions in {peacesciencer}. Space and time considerations, along with some rigidity imposed by CRAN guidelines, preclude me from including these as outright functions or belaboring them in greater detail in the manuscript. Again, {peacesciencer} can do a lot, but it can't do everything. Yet, some of its functionality may not also be obvious from the manuscript or documentation files because they're not necessarily core functions. Thus "parlor trick" is an appropriate descriptor here.

Create a "New State" Variable

The manuscript includes a partial replication of a state-year civil conflict analysis analogous to Fearon and Laitin (2003) and Gibler and Miller (2014). Both of those analyses include a "new state" variable, arguing that states within the first two years of their existence are more likely to experience a civil war onset. The partial replication does not include this. This is because the easiest way to create this variable is through a group_by() mutate based on the row number of the group, but group_by() has the unfortunate side effect of erasing any other attributes in the data (i.e. the ps_system and ps_type attributes). This would break the {peacesciencer} pipe. If you want this variable, I recommend creating and merging this variable after creating the bulk of the data.

Here's how you'd do it.

# Hypothetical main data
create_stateyears(system = 'gw') %>%
  filter(between(year, 1946, 2019)) %>%
  add_ucdp_acd(type = "intrastate") %>%
  add_peace_years() -> Data

# Add in new state variable after the fact
create_stateyears(system = 'gw') %>%
  group_by(gwcode) %>%
  mutate(newstate = ifelse(row_number() <= 2, 1, 0)) %>%
  left_join(Data, .) %>%
  select(gwcode:ucdponset, newstate, everything()) -> Data

# Proof of concept: Here's India
Data %>% filter(gwcode == 750)

# And here's Belize
Data %>% filter(gwcode == 80)

Code Capabilities/Development/Militarization as Bremer (1992) Did

The manuscript includes a replication of Bremer's (1992) "dangerous dyads" design, albeit one that leverages newer/better data sources that were unavailable to Bremer at the time. For convenience's sake, the replication used other approaches to estimating Bremer's variables, including the "weak-link" mechanisms that Dixon (1994) introduced in his seminal work on democratic conflict resolution. If the user wanted to recreate some of the covariates as Bremer (1992) did it, here would be how to do it.

The covariates in question concern information grabbed from the Correlates of War national material capabilities data set.[^democracy] For example, the user guide recreates the "relative power" variable as a proportion of the lower composite index of national capabilities (CINC) variable over the higher one. Bremer opts for a different approach, defining a "relative power" variable as a three-part ordinal category where the more powerful side has a CINC score that is 1) 10 times higher than the less powerful side, 2) three times higher than the other side, or 3) less than three times higher than the other side. Here is the exact passage on p. 322.

[^democracy]: Bremer has a different way of coding democracy (i.e. using a value of 5 or greater on the democracy scale in Polity), but this is so far removed from current practice that it's inadvisable to replicate. If you want to use the Polity data (using add_democracy() in this package), use the polity2 variable that adds the autocracy and democracy indices together. Therein, use the weak-link specification and the distance between the more democratic and less democratic state.

Based on these CINC scores, I computed the larger-to-smaller capability ratios for all dyad-years and classified them into three groups. If the capability ratio was less than or equal to three, then the dyad was considered to constitute a case of small power difference. If the ratio was larger than 10, then the power difference was coded as large, whereas a ratio between 3 and 10 was coded as a medium power difference. If either of the CINC scores was missing (or equal to zero) for a ratio calculation, then the power difference score for that dyad was coded as missing also.

This is an easy case_when() function, but it also would've consumed space and words in a manuscript than the allocated journal space would allow. There's added difficulty in making sure to identify which side in a non-directed dyad-year is more powerful.

cow_ddy %>% # built-in data set for convenience
  filter(ccode2 > ccode1) %>% # make it non-directed
  # add CINC scores
  add_nmc() %>%
  # select just what we want
  select(ccode1:year, cinc1, cinc2) -> Bremer

Bremer %>% 
  # create a three-item ordinal relative power category with values 2, 1, and 0
  mutate(relpow = case_when(
    (cinc1 > cinc2) & (cinc1 > 10*cinc2) ~ 2,
    (cinc1 > cinc2) & ((cinc1 > 3*cinc2) & (cinc1 < 10*cinc2)) ~ 1,
    (cinc1 > cinc2) & (cinc1 <= 3*cinc2) ~ 0,
    # copy-paste, re-arrange
    (cinc2 > cinc1) & (cinc2 > 10*cinc1) ~ 2,
    (cinc2 > cinc1) & ((cinc2 > 3*cinc1) & (cinc2 < 10*cinc1))~ 1,
    (cinc2 > cinc1) & (cinc2 <= 3*cinc1) ~ 0,
    TRUE ~ NA_real_
  )) -> relpow_example

# Let's inspect the output.
relpow_example %>% na.omit %>% 
  mutate(whichside = ifelse(cinc1 > cinc2, "ccode1 > ccode2", 
                            "ccode2 >= ccode1")) %>%
  group_split(whichside, relpow)

Next, the manuscript codes Bremer's (1992) development/"advanced economies" measure using the weak-link of the lower GDP per capita in the dyad using the simulations from Anders et al. (2020). In my defense, this is exactly the kind of data Bremer wishes he had available to him. He says so himself on footnote 26 on page 324.

Under the most optimistic assumptions about data availability, I would estimate that the number of dyad-years for which the relevant data [GNP or GDP per capita] could be assembled would be less than 20% of the total dyad-years under consideration. A more realistic estimate might be as low as 10%. Clearly, our ability to test a generalization when 80% to 90% of the needed data are missing is very limited, and especially so in this case, because the missing data would be concentrated heavily in the pre-World War II era and less advanced states.

Given this limitation, Bremer uses this approach to coding the development/"advanced economies" measure.

A more economically advanced state should be characterized by possessing a share of system-wide economic capability that is greater than its share of system-wide demographic capability. Hence, in years when this was found to be true, I classified a state as more advanced; otherwise, less advanced. The next step involved examining each pair of states in each year and assigning it to one of three groups: both more advanced (7,160 dyad-years), one more advanced (61,823 dyad-years), and both less advanced (128,939 dyad-years).

Replicating this approach is going to require group-by summaries of the raw national material capabilities data, which is outside of {peacesciencer}'s core functionality. Bremer's wording here is a little vague; he doesn't explain what variable, or variables, comprise "economic capability" and "demographic capability." Let's assume that "demographic capability" is just the total population variable whereas the "economic capability" variables include iron and steel production and primary energy consumption. The variable would look something like this.

cow_nmc %>%
  group_by(year) %>%
  # calculate year proportions
  mutate(prop_tpop = tpop/sum(tpop, na.rm=T),
         prop_irst = irst/sum(irst, na.rm=T),
         prop_pec = pec/sum(pec, na.rm=T)) %>%
  ungroup() %>%
  # standardize an "economic capability" measure
  # then make an advanced dummy
  mutate(econcap = (prop_irst + prop_pec)/2,
         advanced = ifelse(econcap > prop_tpop, 1, 0)) %>%
  select(ccode, year, prop_tpop:ncol(.)) -> Advanced

Advanced

Now, let's merge this into the Bremer data frame we created. I'll make this an ordinal variable as well with the same 2, 1, 0 ordering scheme.

Bremer %>%
  left_join(., Advanced %>% select(ccode, year, advanced) %>%
              rename(ccode1 = ccode, advanced1 = advanced)) %>%
  left_join(., Advanced %>% select(ccode, year, advanced) %>% 
              rename(ccode2 = ccode, advanced2 = advanced)) %>%
  mutate(advancedcat = case_when(
    advanced1 == 1 & advanced2 == 1 ~ 2,
    (advanced1 == 1 & advanced2 == 0) | (advanced1 == 0 & advanced2 == 1) ~ 1,
    advanced1 == 0 & advanced2 == 0 ~ 0
  )) -> Bremer

# Let's inspect the output
Bremer %>% na.omit %>%
  group_split(advancedcat)

Finally, the manuscript creates a militarization measure that is a weak-link that uses the data on military personnel and total population. Bremer opts for an approach similar to the development indicator he uses.

Instead, I relied on the material capabilities data set discussed above, and classified a state as more militarized if its share of system-wide military capabilities was greater than its share of system-wide demographic capabilities. I classified it less militarized if this was not true. The classification of each dyad-year was then based on whether both, one, or neither of the two states making up the dyad were more militarized in that year.

It reads like this is what he's doing, while again reiterating that I'm assuming he's using just the total population variable to measure "demographic capability."

cow_nmc %>%
  group_by(year) %>%
  # calculate year proportions
  mutate(prop_tpop = tpop/sum(tpop, na.rm=T),
         prop_milex = milex/sum(milex, na.rm=T),
         prop_milper = milper/sum(milper, na.rm=T)) %>%
  ungroup() %>%
  # standardize a "military capability" measure
  # then make an advanced dummy
  mutate(militcap = (prop_milper + prop_milex)/2,
         militarized = ifelse(militcap > prop_tpop, 1, 0)) %>%
  select(ccode, year, prop_tpop:ncol(.)) -> Militarized

Militarized

Let's merge this into the Bremer data we created and inspect the output.

Bremer %>%
  left_join(., Militarized %>% select(ccode, year, militarized) %>% 
              rename(ccode1 = ccode, militarized1 = militarized)) %>%
  left_join(., Militarized %>% select(ccode, year, militarized) %>% 
              rename(ccode2 = ccode, militarized2 = militarized)) %>%
  mutate(militcat = case_when(
    militarized1 == 1 & militarized2 == 1 ~ 2,
    (militarized1 == 1 & militarized2 == 0) | 
      (advanced1 == 0 & militarized2 == 1) ~ 1,
    militarized1 == 0 & militarized2 == 0 ~ 0
  )) -> Bremer

Bremer %>% select(ccode1:year, militarized1:ncol(.)) %>% 
  na.omit %>%
  group_split(militcat)

If we wanted to perfectly recreate the data as Bremer (1992) did it almost 30 years ago, here's how you'd do it in {peacesciencer} (albeit with newer data). Still, I think the data innovations that have followed Bremer (1992) merit the approach employed in the manuscript.

Get Multiple Peace Years in One Fell Swoop

add_peace_years() is designed to work generally, based on the other data/functions included in the package. For example, assume you wanted to a dyad-year analysis comparing the Correlates of War (CoW) Militarized Interstate Dispute (MID) with the Gibler-Miller-Little conflict data. Just add both in the pipe and ask for peace-years.

cow_ddy %>%
  # non-directed, politically relevant, for convenience
  filter(ccode2 > ccode1) %>%
  filter_prd() %>%
  add_cow_mids(keep = NULL) %>%
  add_gml_mids(keep = NULL) %>%
  add_peace_years() -> NDY

# Here's a snapshot of U.S-Cuba from 1980-89 for illustration sake.
NDY %>%
  filter(ccode1 == 2 & ccode2 == 40) %>%
  select(ccode1:year, cowmidongoing, gmlmidongoing, cowmidspell:gmlmidspell) %>%
  filter(year >= 1980)

You can do this with state-year data as well. For example, you can compare how CoW and UCDP code civil wars differently since 1946. Do note, however, that the nature of different state systems used in these data sets means we'll treat one as a master and merge other codes into it.

create_stateyears(system = 'gw') %>%
  filter(between(year, 1946, 2019)) %>%
  add_ccode_to_gw() %>%
  add_ucdp_acd(type = "intrastate", only_wars = TRUE) %>%
  add_cow_wars(type = "intra") %>%
  # select just a few things
  select(gwcode, ccode, year, statename, ucdpongoing, ucdponset,
         cowintraongoing, cowintraonset) %>%
  add_peace_years() %>%
  select(gwcode:statename, ucdpspell, cowintraspell, everything()) %>%
  # India is illustrative of how the two differ.
  # UCDP has an intra-state conflict to the level of war early 
  #  into its existence. CoW does not.
  filter(gwcode == 750)

Measure Leader Tenure in Days

create_leaderyears(), by default, returns an estimate of leader-tenure as the unique calendar year for the leader. I think of this is a reasonable thing to include, and benchmarking to years is doing some internal lifting elsewhere in the function that generates leader-year data from leader-day data in Archigos. However, it can lead some peculiar observations that may not square with how we knee-jerk think about leader tenure.

I will illustrate what I mean by this with the case of Jimmy Carter from leader-year data standardized to Correlates of War state system membership.

leader_years <- create_leaderyears(standardize = 'cow')

leader_years %>% filter(obsid == "USA-1977")

Jimmy Carter took office in January 1977 (year 1) and had a tenure through 1978 (year 2), 1979 (year 3), 1980 (year 4), and exited office in January 1981 (year 5). We know presidents in the American context have four-year terms. This output suggests five years.

If this is that problematic for the research design, especially one that may be interested in what happens to leader behavior after a certain amount of time in office, a user can do something like generate estimates of leader tenure in a given year to the day. Basically, once the core leader-year are generated, the user can use the create_leaderdays() function and summarize leader tenure in the year as the minimum number of days the leader was in office in the year and the maximum number of days the leader was in office in the year.

# don't standardize the leader-days for this use, just to be safe.
create_leaderdays(standardize = 'none') %>% 
  # extract year from date
  mutate(year = lubridate::year(date)) %>%
  # group by leader
  group_by(obsid) %>%
  # count days in office, for leader tenure
  mutate(daysinoffice = seq(1:n())) %>%
  # group-by leader and year
  group_by(obsid, year) %>%
  # how long was the minimum (maximum) days in office for the leader in the year?
  summarize(min_daysoffice = min(daysinoffice),
            max_dayoffice = max(daysinoffice)) %>%
  #practice safe group-by, and assign to object
  ungroup() -> leader_tenures

# add this information to our data
leader_years %>%
  left_join(., leader_tenures) -> leader_years

Here's what this would look like in the case of Jimmy Carter.

leader_years %>% filter(obsid == "USA-1977")

This measure might be more useful. Basically, Jimmy Carter was a new leader in 1977 (min_daysoffice = 1). By 1978, he had almost a year under his belt (i.e. Jan. 1, 1978 was his 347th day in office). By time he left office in 1981, he had completed 1,462 days on the job.

create_leaderyears() elects to not create this information for the user. No matter, it does not take much effort for the user to create it if this is the kind of information they wanted.

svmiller/peacesciencer documentation built on April 3, 2025, 1:15 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com