knitr::opts_chunk$set(collapse = TRUE, comment = "#>") library(dplyr) library(readr)
Some of the 'many' packages include textual data on international agreements
governing certain issue-domain (e.g. {manytrade}
and {manyenviron}
).
While {manydata}
facilitates accessing 'many' data,
{manypks}
helps developers to standardise and work with treaty data.
Using {manydata}
, we can download and check which databases and datasets are available in the packages.
library(manydata) manydata::get_packages("manytrade") manydata::data_contrast("manytrade") manydata::data_contrast("manytrade", "agreements")
One of the central problems in assembled data packages is identifying equivalences between observations across datasets.
To solve this issue we created the code_agreements()
function
which extracts key identifying information from the titles and dates,
rendering the same treatyID for observations that have the same key identifying information.
Please see the Agreements vignette for more information on how code_agreements()
works
and its performance vis-a-vis other matching options.
trade_agreements <- lapply(manytrade::agreements, function(x) x[x$Beg > "1995-01-01" & x$Beg < "2000-01-01", ]) trade_agreements <- lapply(trade_agreements, function(x) dplyr::select(x, Title, Beg)) trade_agreements <- purrr::map_df(trade_agreements, ~as.data.frame(.x), .id = "Dataset") trade_agreements <- trade_agreements[complete.cases(trade_agreements), ] trade_agreements$treatyID <- manypkgs::code_agreements( title = trade_agreements$Title, date = trade_agreements$Beg)
trade_agreements <- read_csv("trade_agreements.csv")
Different treatyIDs generated with code_agreements()
for different datasets in a database
might contain minor differences that could mean these IDs actually refer to the same agreement.
The condense_agreements()
function to works to match these inconsistencies
within databases or variables.
manyID <- manypkgs::condense_agreements(idvar = trade_agreements$treatyID) trade_agreements <- dplyr::left_join(trade_agreements, manyID, by = "treatyID") %>% dplyr::mutate(compareID = ifelse(treatyID == manyID, "Equal", "Different")) summary(as.factor(trade_agreements$compareID))
Before analysing the text data contained in the datasets,
standardising observations to ensure that they have the same format (e.g. in terms of spacing)
can help make automated text analysis more convenient and straightforward.
The data in {manytrade}
and {manyenviron}
were standardised
using functions in {manypkgs}
before being exported.
Treaty titles in the 'many' datasets were first standardised using standardise_titles()
.
Then, standardise_treaty_texts()
was used to annotate the different segments of treaties
(e.g. preamble, articles, and annexes) and removes redundant paragraph, line,
and website markers as much as possible.
The 'many' packages contain databases that store the original texts of treaties,
where available digitally,
listed in the agreements
, memberships
, and references
databases.
{manypkgs}
has several functions to standardize and extract information from these texts.
At its most basic, read_clauses()
extracts specified portions of treaties,
such as the 'preamble' or the 'accession' clause.
To reduce the size of the output produced,
read_clauses()
matches a limited number of words only.
This is currently set to a maximum of 20 words for accession articles,
since these clauses are usually shorter than this length.
trade_treaties <- manytrade::agreements$HUGGO %>% dplyr::filter(Beg > "2000-01-01" & Beg < "2006-01-01", !is.na(TreatyText), !grepl("NULL", TreatyText)) %>% dplyr::select(manyID, Title, Beg, TreatyText) manypkgs::read_clauses(trade_treaties$TreatyText, article = "preamble") # needs to improve... manypkgs::read_clauses(trade_treaties$TreatyText, article = "accession") manypkgs::read_clauses(trade_treaties$TreatyText, treaty_type = "agreements") manypkgs::read_clauses(trade_treaties$TreatyText, match = "human rights")
With the standardized treaty text in hand, we can leverage functions in
{manypkgs}
to code accession and termination rules for treaties.
code_accession_terms()
locates the conditions and processes for accession to a treaty
(depending on which is specified in the argument).
Accession conditions are categorised either as 'open',
where the treaty is open to any government,
or 'semi-open', where nomination by an existing party to the treaty is required.
The function identifies the category by matching keywords located in the
clauses concerning accession (identified through read_clauses()
:
'any government', 'all governments', etc for 'open' and
'nomination' for 'semi-open'.
manypkgs::code_accession_terms(trade_treaties$TreatyText) # All NAs? manypkgs::code_term(trade_treaties$TreatyText)
{manydata}
contains several functions to help easily retrieve additional
from 'many' treaties.
We can, for example, retrieve certain types of treaties with
retrieve_bilaterals()
and retrieve_multilaterals()
.
manydata::retrieve_bilaterals(manytrade::memberships$DESTA_MEM, actor = "stateID") manydata::retrieve_multilaterals(manytrade::memberships$GPTAD_MEM)
We can also retrieve treaty linkages with retrieve_links()
.
manydata::retrieve_links(manytrade::agreements)
Or, even, retrieve treaty membership lists with retrieve_memberships()
.
manydata::retrieve_membership_list(manytrade::memberships$DESTA_MEM, "stateID")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.