rodham
aims at easing access and analysis of Hillary Rodham Clinton's personal emails which are deemed important to the author in light of recent events.
The function search_emails
allows fetching the list of emails that were released. These are available either by calling the Wall Street Journal's API or via the built-in dataset (recommended).
knitr::opts_chunk$set( fig.width = 7, fig.height = 5 )
library(rodham) # get list of emails data("emails") # equivalent to: em <- search_emails() identical(emails, em)
Using the list of emails (data("emails")
) we can plot the network of emails using edges_emails
which returns a list of edges meant for a directed network.
edges <- edges_emails(emails) knitr::kable(head(edges))
The freq
corresponds to the occurences of edges (number of emails). The list of edges alone allows building a simple network.
g <- igraph::graph.data.frame(edges) # plot network plot(g, layout = igraph::layout.fruchterman.reingold(g), vertex.label.color = hsv(h = 0, s = 0, v = 0, alpha = 0.0), vertex.size = log1p(igraph::degree(g)) * 2, edge.arrow.size = 0.1, edge.arrow.width = 0.1, edge.width = log1p(igraph::E(g)$freq)/4, vertex.frame.color="#FFFFFF")
In the above we gather a reasonable amount of meta-data on the emails but we do not get the actual content of the emails. To do so we need to download the emails---as released---in PDF format and extract the text. First we are going to need xpdf to extract the content; you can either download it manually from the download setion or you can attempt using get_xpdf
(only tested on windows).
get_xpdf
downloads then unzips the extractor then returns the full path to the pdftotext.exe file required for the next step.
xpdf <- get_xpdf(dest = "C:/") # get extractor # or if you downloaded manually point to pdftotext xpdf <- "your/path/xpdfbin-win-3.04/bin64/pdftotext"
Once we have the extractor we can fetch some emails using get_emails
, the function requires you to select a specific release
, here are the valid ones:
dir.create(dir) # directory must exist emails_bengh <- get_emails(release = "Benghazi", save.dir = "./rodham", extractor = xpdf)
get_emails
downloads, unzips and extracts the content from all email; note that this may take some time. The files will be extracted in a folder named after the requested release
and its full path returned (for future use).
Alternatively you may want to proceed step by step. This is particularly useful if your temp folder requires super user or if you want to keep the pdf files.
# download specific release dl <- download_emails("August") # returns full pass to zip pdf <- "emails_pdf" # directory where pdf will be extracted to txt <- "emails.text" # directory where txt will be extracted to # create directories dir.create(pdf) dir.create(emails_bengh) unzip(dl, exdir = pdf) # get emails released in august extract_emails(pdf, save.dir = txt, extractor = ext)
Now we can read the .txt
files in R to a named list where the each email is named after its corresponding file.
contents <- load_emails(emails_bengh)
You can clean the emails with clean_content
it'll remove some comments and other unwanted lines.
cont <- get_content(contents) cont <- clean_content(cont)
get_content
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.