get_emails: Get emails and its contents

Description Usage Arguments Details Value Author(s) See Also Examples

Description

Get the content of Hillary Rodham Clinton's emails by release.

Usage

1
get_emails(release, save.dir = getwd(), extractor)

Arguments

release

Name of the batch of release of emails; see details.

save.dir

Directory where to save the extracted text defaults to getwd()

extractor

Full path to pdf extractor (text to pdf), see details.

Details

Below are the valid values for release; follows the WSJ naming convention.

  • Benghazi

  • June

  • July

  • August

  • September

  • October

  • November

  • January 7

  • January 29

  • February 19

  • february 29

  • December

  • Non-disclosure

The extractor argument is the full path to your pdftotext.exe extractor; visit xpdf to download or try get_xpdf which attempts to download and unzip the text to pdf extractor. See examples.

Value

Fetches email zip file from the WSJ and extract text files in save.dir, returns full path to directory that contains parsed txt files.

Author(s)

John Coene jcoenep@gmail.com

See Also

get_xpdf

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
## Not run: 
# get xpdf extractor
ext <- get_xpdf()

# create
dir.create("./emails")

# get emails released in august
emails_aug <- get_emails(release = "August", save.dir = "./emails",
                     extractor = ext)

# use manually downloaded extractor
ext <- "C:/xpdfbin-win-3.04/bin64/pdftotext.exe"

# get emails related to Benghazi released in December
emails_bengh <- get_emails(release = "Benghazi", extractor = ext,
                           save.dir = "./emails")

files <- list.files(emails_bengh)
content <- lapply(1:length(files), function(x){
   readLines(paste0(emails_bengh, "/", files[[x]]))
})

## End(Not run)


Search within the rodham package
Search all R packages, documentation and source code

Questions? Problems? Suggestions? or email at ian@mutexlabs.com.

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.