Get emails and its contents

Description

Get the content of Hillary Rodham Clinton's emails by release.

Usage

1
get_emails(release, save.dir = getwd(), extractor)

Arguments

release

Name of the batch of release of emails; see details.

save.dir

Directory where to save the extracted text defaults to getwd()

extractor

Full path to pdf extractor (text to pdf), see details.

Details

Below are the valid values for release; follows the WSJ naming convention.

  • Benghazi

  • June

  • July

  • August

  • September

  • October

  • November

  • January 7

  • January 29

  • February 19

  • february 29

  • December

  • Non-disclosure

The extractor argument is the full path to your pdftotext.exe extractor; visit xpdf to download or try get_xpdf which attempts to download and unzip the text to pdf extractor. See examples.

Value

Fetches email zip file from the WSJ and extract text files in save.dir, returns full path to directory that contains parsed txt files.

Author(s)

John Coene jcoenep@gmail.com

See Also

get_xpdf

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
## Not run: 
# get xpdf extractor
ext <- get_xpdf()

# create
dir.create("./emails")

# get emails released in august
emails_aug <- get_emails(release = "August", save.dir = "./emails",
                     extractor = ext)

# use manually downloaded extractor
ext <- "C:/xpdfbin-win-3.04/bin64/pdftotext.exe"

# get emails related to Benghazi released in December
emails_bengh <- get_emails(release = "Benghazi", extractor = ext,
                           save.dir = "./emails")

files <- list.files(emails_bengh)
content <- lapply(1:length(files), function(x){
   readLines(paste0(emails_bengh, "/", files[[x]]))
})

## End(Not run)