nmfsscraper: Scrape ESA documents from the NMFS website

Description Details Note


Scrape ESA documents from the NMFS website


The National Marine Fisheries Service (NMFS), along with the U.S. Fish and Wildlife Service (FWS), is responsible for implementing the Endangered Species Act (ESA). There is a huge amount of information about ESA-listed species and about how the ESA is implemented in tens of thousands of PDFs that the Services host, but the Services don't make those documents available from a single, central download location. This package is a (set of) scrapers to search for PDFs on the NMFS website, http://www.nmfs.noaa.gov, and download them locally to facilitate analysis of the embedded information, e.g., using Natural Language Processing (NLP).


nmfsscraper also includes a glossary of terms regularly used by NMFS, from a document that can be easily scraped from their website. The source site, http://www.st.nmfs.noaa.gov/st4/documents/FishGlossary.pdf, required some manual adjustments, and is included so users can easily access the definitions but do not have to extract the glossary themselves.

jacob-ogre/nmfsscraper documentation built on May 17, 2017, 6:43 a.m.