README.md
In jacob-ogre/esadocs: A Search Engine for Endangered Species Act Documents

There are thousands of administrative documents about the U.S. Endangered Species Act (ESA) available on the internet, thousands that are not yet publicly available, and thousands more produced each year. We have gathered PDFs of what is publicly available and added our copies of documents acquired through other means (e.g., Freedom of Information Act [FOIA] requests) to a base collection. The plain text of each document has been extracted or Optical Character Recognition (OCR) used to identify the text. All of this is loaded into an Elasticsearch database, and a web app developed to facilitate searching all of these documents.

Each document in the elasticsearch database includes the following fields:

index: esadocs
type: type of document, including
- five_year_review
- federal_register
- recovery_plan
- section_7a1
- section_7a2
- section_10a1A
- CCA
- CCAA
- HCP
- SHA
- misc
raw_txt, the raw text of the document, for index and search
txt, the path to the text file
pdf, the path to the pdf
basename, the base name of the pdf and txt files for joining
file_name, the text (a tag) from the ECOS link of the document, or another name to ID the document
orig_link, the original URL (href) from ECOS

Were you searching for a document and find an error? That's entirely possible, especially for documents where the text was extracted by OCR from a PDF with low-resolution pages. If you have a correct version - either because you have the original, manually entered the text, or by other means - then please get in touch. We plan to offer a more automated version of error correction, e.g., texts in a git repo with the opportunity to fork and submit pull requests, in the future. For now, we will make corrections manually.

Do you have or know of ESA-related documents that could be added to our database? Please get in touch to discuss how we can work together to make publicly available as much information as possible.

jacob-ogre/esadocs documentation built on May 18, 2019, 8 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com