Description Usage Arguments Details Value Examples
Simple function to download a PDF, robustly.
1 |
url |
The URL for a PDF |
file |
File to which the PDF will be downloaded |
quiet |
Suppress a message about which URL is being processed [default=FALSE] |
overwrite |
Overwrite an existing file of the same name [default=FALSE] |
pause |
Whether to pause for 0.5-3 seconds during scraping [default=TRUE] |
Scraping PDFs from the web can run into little hitches that make
writing a scraper annoying. This simplifies PDF scraping by creating a
dedicated function and support functions to, e.g., test for PDFness. Ensures
URL encoding, handles missing URLs gracefully. The filename is the basename
of the URL with " " replaced with "_". Includes the pause
parameter
to limit the rate at which requests hit the hosting servers.
TODO: Have the overwrite check work on the MD5 hash of files in the download
sudb
rather than relying on file names.
A data.frame with url, destination, success, pdfCheck
1 2 3 4 5 | ## Not run:
result <- download_pdf(url = "https://goo.gl/I3P3A3",
file = "~/Downloads/test.pdf")
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.