README.md

Elasticsearch and an app to search ESA documents

There are thousands of administrative documents about the U.S. Endangered Species Act (ESA) available on the internet, thousands that are not yet publicly available, and thousands more produced each year. We have gathered PDFs of what is publicly available and added our copies of documents acquired through other means (e.g., Freedom of Information Act [FOIA] requests) to a base collection. The plain text of each document has been extracted or Optical Character Recognition (OCR) used to identify the text. All of this is loaded into an Elasticsearch database, and a web app developed to facilitate searching all of these documents.

Document structure

Each document in the elasticsearch database includes the following fields:

Errors?

Were you searching for a document and find an error? That's entirely possible, especially for documents where the text was extracted by OCR from a PDF with low-resolution pages. If you have a correct version - either because you have the original, manually entered the text, or by other means - then please get in touch. We plan to offer a more automated version of error correction, e.g., texts in a git repo with the opportunity to fork and submit pull requests, in the future. For now, we will make corrections manually.

Additions welcome

Do you have or know of ESA-related documents that could be added to our database? Please get in touch to discuss how we can work together to make publicly available as much information as possible.



jacob-ogre/esadocs documentation built on May 18, 2019, 8 a.m.