An R package to simplify Portable Document Format (PDF) file downloads.
Scraping PDFs from the web can run into little hitches that make writing a
scraper annoying. This simplifies PDF scraping by creating a dedicated function
and support functions to, e.g., test for PDFness. Ensures URL encoding, handles
missing URLs gracefully. The main function, download_pdf
, includes the
pause
parameter (random 0-3s) to limit the rate at which requests hit the
host server. We mostly use this to facilitate scraping U.S. Government documents
that are only available as PDFs.
Use devtools to install pdfdown
:
devtools::install_github("Defenders-ESC/pdfdown")
Get the five-year review for the Pecos puzzle sunflower:
url <- "https://ecos.fws.gov/docs/five_year_review/doc4599.pdf"
helpar5y <- download_pdf(url, "~/Downloads/doc4599.pdf")
Find a bug or have a question? Submit an issue on GitHub! Alternatively, get in touch.
Want to add features or fix a bug? Fork the repo and submit a pull request! Thanks!
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.