knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(tidyverse) library(peacesciencer)
This is a running list of various parlor tricks that you can do with the data and functions in {peacesciencer}
. Space and time considerations, along with some rigidity imposed by CRAN guidelines, preclude me from including these as outright functions or belaboring them in greater detail in the manuscript. Again, {peacesciencer}
can do a lot, but it can't do everything. Yet, some of its functionality may not also be obvious from the manuscript or documentation files because they're not necessarily core functions. Thus "parlor trick" is an appropriate descriptor here.
The manuscript includes a partial replication of a state-year civil conflict analysis analogous to Fearon and Laitin (2003) and Gibler and Miller (2014). Both of those analyses include a "new state" variable, arguing that states within the first two years of their existence are more likely to experience a civil war onset. The partial replication does not include this. This is because the easiest way to create this variable is through a group_by()
mutate based on the row number of the group, but group_by()
has the unfortunate side effect of erasing any other attributes in the data (i.e. the ps_system
and ps_type
attributes). This would break the {peacesciencer}
pipe. If you want this variable, I recommend creating and merging this variable after creating the bulk of the data.
Here's how you'd do it.
# Hypothetical main data create_stateyears(system = 'gw') %>% filter(between(year, 1946, 2019)) %>% add_ucdp_acd(type = "intrastate") %>% add_peace_years() -> Data # Add in new state variable after the fact create_stateyears(system = 'gw') %>% group_by(gwcode) %>% mutate(newstate = ifelse(row_number() <= 2, 1, 0)) %>% left_join(Data, .) %>% select(gwcode:ucdponset, newstate, everything()) -> Data # Proof of concept: Here's India Data %>% filter(gwcode == 750) # And here's Belize Data %>% filter(gwcode == 80)
The manuscript includes a replication of Bremer's (1992) "dangerous dyads" design, albeit one that leverages newer/better data sources that were unavailable to Bremer at the time. For convenience's sake, the replication used other approaches to estimating Bremer's variables, including the "weak-link" mechanisms that Dixon (1994) introduced in his seminal work on democratic conflict resolution. If the user wanted to recreate some of the covariates as Bremer (1992) did it, here would be how to do it.
The covariates in question concern information grabbed from the Correlates of War national material capabilities data set.[^democracy] For example, the user guide recreates the "relative power" variable as a proportion of the lower composite index of national capabilities (CINC) variable over the higher one. Bremer opts for a different approach, defining a "relative power" variable as a three-part ordinal category where the more powerful side has a CINC score that is 1) 10 times higher than the less powerful side, 2) three times higher than the other side, or 3) less than three times higher than the other side. Here is the exact passage on p. 322.
[^democracy]: Bremer has a different way of coding democracy (i.e. using a value of 5 or greater on the democracy scale in Polity), but this is so far removed from current practice that it's inadvisable to replicate. If you want to use the Polity data (using add_democracy()
in this package), use the polity2
variable that adds the autocracy and democracy indices together. Therein, use the weak-link specification and the distance between the more democratic and less democratic state.
Based on these CINC scores, I computed the larger-to-smaller capability ratios for all dyad-years and classified them into three groups. If the capability ratio was less than or equal to three, then the dyad was considered to constitute a case of small power difference. If the ratio was larger than 10, then the power difference was coded as large, whereas a ratio between 3 and 10 was coded as a medium power difference. If either of the CINC scores was missing (or equal to zero) for a ratio calculation, then the power difference score for that dyad was coded as missing also.
This is an easy case_when()
function, but it also would've consumed space and words in a manuscript than the allocated journal space would allow. There's added difficulty in making sure to identify which side in a non-directed dyad-year is more powerful.
cow_ddy %>% # built-in data set for convenience filter(ccode2 > ccode1) %>% # make it non-directed # add CINC scores add_nmc() %>% # select just what we want select(ccode1:year, cinc1, cinc2) -> Bremer Bremer %>% # create a three-item ordinal relative power category with values 2, 1, and 0 mutate(relpow = case_when( (cinc1 > cinc2) & (cinc1 > 10*cinc2) ~ 2, (cinc1 > cinc2) & ((cinc1 > 3*cinc2) & (cinc1 < 10*cinc2)) ~ 1, (cinc1 > cinc2) & (cinc1 <= 3*cinc2) ~ 0, # copy-paste, re-arrange (cinc2 > cinc1) & (cinc2 > 10*cinc1) ~ 2, (cinc2 > cinc1) & ((cinc2 > 3*cinc1) & (cinc2 < 10*cinc1))~ 1, (cinc2 > cinc1) & (cinc2 <= 3*cinc1) ~ 0, TRUE ~ NA_real_ )) -> relpow_example # Let's inspect the output. relpow_example %>% na.omit %>% mutate(whichside = ifelse(cinc1 > cinc2, "ccode1 > ccode2", "ccode2 >= ccode1")) %>% group_split(whichside, relpow)
Next, the manuscript codes Bremer's (1992) development/"advanced economies" measure using the weak-link of the lower GDP per capita in the dyad using the simulations from Anders et al. (2020). In my defense, this is exactly the kind of data Bremer wishes he had available to him. He says so himself on footnote 26 on page 324.
Under the most optimistic assumptions about data availability, I would estimate that the number of dyad-years for which the relevant data [GNP or GDP per capita] could be assembled would be less than 20% of the total dyad-years under consideration. A more realistic estimate might be as low as 10%. Clearly, our ability to test a generalization when 80% to 90% of the needed data are missing is very limited, and especially so in this case, because the missing data would be concentrated heavily in the pre-World War II era and less advanced states.
Given this limitation, Bremer uses this approach to coding the development/"advanced economies" measure.
A more economically advanced state should be characterized by possessing a share of system-wide economic capability that is greater than its share of system-wide demographic capability. Hence, in years when this was found to be true, I classified a state as more advanced; otherwise, less advanced. The next step involved examining each pair of states in each year and assigning it to one of three groups: both more advanced (7,160 dyad-years), one more advanced (61,823 dyad-years), and both less advanced (128,939 dyad-years).
Replicating this approach is going to require group-by summaries of the raw national material capabilities data, which is outside of {peacesciencer}
's core functionality. Bremer's wording here is a little vague; he doesn't explain what variable, or variables, comprise "economic capability" and "demographic capability." Let's assume that "demographic capability" is just the total population variable whereas the "economic capability" variables include iron and steel production and primary energy consumption. The variable would look something like this.
cow_nmc %>% group_by(year) %>% # calculate year proportions mutate(prop_tpop = tpop/sum(tpop, na.rm=T), prop_irst = irst/sum(irst, na.rm=T), prop_pec = pec/sum(pec, na.rm=T)) %>% ungroup() %>% # standardize an "economic capability" measure # then make an advanced dummy mutate(econcap = (prop_irst + prop_pec)/2, advanced = ifelse(econcap > prop_tpop, 1, 0)) %>% select(ccode, year, prop_tpop:ncol(.)) -> Advanced Advanced
Now, let's merge this into the Bremer
data frame we created. I'll make this an ordinal variable as well with the same 2, 1, 0 ordering scheme.
Bremer %>% left_join(., Advanced %>% select(ccode, year, advanced) %>% rename(ccode1 = ccode, advanced1 = advanced)) %>% left_join(., Advanced %>% select(ccode, year, advanced) %>% rename(ccode2 = ccode, advanced2 = advanced)) %>% mutate(advancedcat = case_when( advanced1 == 1 & advanced2 == 1 ~ 2, (advanced1 == 1 & advanced2 == 0) | (advanced1 == 0 & advanced2 == 1) ~ 1, advanced1 == 0 & advanced2 == 0 ~ 0 )) -> Bremer # Let's inspect the output Bremer %>% na.omit %>% group_split(advancedcat)
Finally, the manuscript creates a militarization measure that is a weak-link that uses the data on military personnel and total population. Bremer opts for an approach similar to the development indicator he uses.
Instead, I relied on the material capabilities data set discussed above, and classified a state as more militarized if its share of system-wide military capabilities was greater than its share of system-wide demographic capabilities. I classified it less militarized if this was not true. The classification of each dyad-year was then based on whether both, one, or neither of the two states making up the dyad were more militarized in that year.
It reads like this is what he's doing, while again reiterating that I'm assuming he's using just the total population variable to measure "demographic capability."
cow_nmc %>% group_by(year) %>% # calculate year proportions mutate(prop_tpop = tpop/sum(tpop, na.rm=T), prop_milex = milex/sum(milex, na.rm=T), prop_milper = milper/sum(milper, na.rm=T)) %>% ungroup() %>% # standardize a "military capability" measure # then make an advanced dummy mutate(militcap = (prop_milper + prop_milex)/2, militarized = ifelse(militcap > prop_tpop, 1, 0)) %>% select(ccode, year, prop_tpop:ncol(.)) -> Militarized Militarized
Let's merge this into the Bremer
data we created and inspect the output.
Bremer %>% left_join(., Militarized %>% select(ccode, year, militarized) %>% rename(ccode1 = ccode, militarized1 = militarized)) %>% left_join(., Militarized %>% select(ccode, year, militarized) %>% rename(ccode2 = ccode, militarized2 = militarized)) %>% mutate(militcat = case_when( militarized1 == 1 & militarized2 == 1 ~ 2, (militarized1 == 1 & militarized2 == 0) | (advanced1 == 0 & militarized2 == 1) ~ 1, militarized1 == 0 & militarized2 == 0 ~ 0 )) -> Bremer Bremer %>% select(ccode1:year, militarized1:ncol(.)) %>% na.omit %>% group_split(militcat)
If we wanted to perfectly recreate the data as Bremer (1992) did it almost 30 years ago, here's how you'd do it in {peacesciencer}
(albeit with newer data). Still, I think the data innovations that have followed Bremer (1992) merit the approach employed in the manuscript.
add_peace_years()
is designed to work generally, based on the other data/functions included in the package. For example, assume you wanted to a dyad-year analysis comparing the Correlates of War (CoW) Militarized Interstate Dispute (MID) with the Gibler-Miller-Little conflict data. Just add both in the pipe and ask for peace-years.
cow_ddy %>% # non-directed, politically relevant, for convenience filter(ccode2 > ccode1) %>% filter_prd() %>% add_cow_mids(keep = NULL) %>% add_gml_mids(keep = NULL) %>% add_peace_years() -> NDY # Here's a snapshot of U.S-Cuba from 1980-89 for illustration sake. NDY %>% filter(ccode1 == 2 & ccode2 == 40) %>% select(ccode1:year, cowmidongoing, gmlmidongoing, cowmidspell:gmlmidspell) %>% filter(year >= 1980)
You can do this with state-year data as well. For example, you can compare how CoW and UCDP code civil wars differently since 1946. Do note, however, that the nature of different state systems used in these data sets means we'll treat one as a master and merge other codes into it.
create_stateyears(system = 'gw') %>% filter(between(year, 1946, 2019)) %>% add_ccode_to_gw() %>% add_ucdp_acd(type = "intrastate", only_wars = TRUE) %>% add_cow_wars(type = "intra") %>% # select just a few things select(gwcode, ccode, year, statename, ucdpongoing, ucdponset, cowintraongoing, cowintraonset) %>% add_peace_years() %>% select(gwcode:statename, ucdpspell, cowintraspell, everything()) %>% # India is illustrative of how the two differ. # UCDP has an intra-state conflict to the level of war early # into its existence. CoW does not. filter(gwcode == 750)
create_leaderyears()
, by default, returns an estimate of leader-tenure as the unique calendar year for the leader. I think of this is a reasonable thing to include, and benchmarking to years is doing some internal lifting elsewhere in the function that generates leader-year data from leader-day data in Archigos. However, it can lead some peculiar observations that may not square with how we knee-jerk think about leader tenure.
I will illustrate what I mean by this with the case of Jimmy Carter from leader-year data standardized to Correlates of War state system membership.
leader_years <- create_leaderyears(standardize = 'cow') leader_years %>% filter(obsid == "USA-1977")
Jimmy Carter took office in January 1977 (year 1) and had a tenure through 1978 (year 2), 1979 (year 3), 1980 (year 4), and exited office in January 1981 (year 5). We know presidents in the American context have four-year terms. This output suggests five years.
If this is that problematic for the research design, especially one that may be interested in what happens to leader behavior after a certain amount of time in office, a user can do something like generate estimates of leader tenure in a given year to the day. Basically, once the core leader-year are generated, the user can use the create_leaderdays()
function and summarize leader tenure in the year as the minimum number of days the leader was in office in the year and the maximum number of days the leader was in office in the year.
# don't standardize the leader-days for this use, just to be safe. create_leaderdays(standardize = 'none') %>% # extract year from date mutate(year = lubridate::year(date)) %>% # group by leader group_by(obsid) %>% # count days in office, for leader tenure mutate(daysinoffice = seq(1:n())) %>% # group-by leader and year group_by(obsid, year) %>% # how long was the minimum (maximum) days in office for the leader in the year? summarize(min_daysoffice = min(daysinoffice), max_dayoffice = max(daysinoffice)) %>% #practice safe group-by, and assign to object ungroup() -> leader_tenures # add this information to our data leader_years %>% left_join(., leader_tenures) -> leader_years
Here's what this would look like in the case of Jimmy Carter.
leader_years %>% filter(obsid == "USA-1977")
This measure might be more useful. Basically, Jimmy Carter was a new leader in 1977 (min_daysoffice = 1
). By 1978, he had almost a year under his belt (i.e. Jan. 1, 1978 was his 347th day in office). By time he left office in 1981, he had completed 1,462 days on the job.
create_leaderyears()
elects to not create this information for the user. No matter, it does not take much effort for the user to create it if this is the kind of information they wanted.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.