download_pdf: Download a PDF from a URL

Description Usage Arguments Details Value Examples

View source: R/download_pdf.R

Description

Simple function to download a PDF, robustly.

Usage

1
download_pdf(url, file, quiet = FALSE, overwrite = FALSE, pause = TRUE)

Arguments

url

The URL for a PDF

file

File to which the PDF will be downloaded

quiet

Suppress a message about which URL is being processed [default=FALSE]

overwrite

Overwrite an existing file of the same name [default=FALSE]

pause

Whether to pause for 0.5-3 seconds during scraping [default=TRUE]

Details

Scraping PDFs from the web can run into little hitches that make writing a scraper annoying. This simplifies PDF scraping by creating a dedicated function and support functions to, e.g., test for PDFness. Ensures URL encoding, handles missing URLs gracefully. The filename is the basename of the URL with " " replaced with "_". Includes the pause parameter to limit the rate at which requests hit the hosting servers.

TODO: Have the overwrite check work on the MD5 hash of files in the download sudb rather than relying on file names.

Value

A data.frame with url, destination, success, pdfCheck

Examples

1
2
3
4
5
## Not run: 
  result <- download_pdf(url = "https://goo.gl/I3P3A3",
                         file = "~/Downloads/test.pdf")

## End(Not run)

Defenders-ESC/pdfdown documentation built on May 6, 2019, 1:58 p.m.