arete: Automated REtrieval from TExt

A Python based pipeline for extraction of species occurrence data through the usage of large language models. Includes validation tools designed to handle model hallucinations for a scientific, rigorous use of LLM. Currently supports usage of GPT with more planned, including local and non-proprietary models. For more details on the methodology used please consult the references listed under each function, such as Kent, A. et al. (1995) <doi:10.1002/asi.5090060209>, van Rijsbergen, C.J. (1979, ISBN:978-0408709293, Levenshtein, V.I. (1966) <https://nymity.ch/sybilhunting/pdf/Levenshtein1966a.pdf> and Klaus Krippendorff (2011) <https://repository.upenn.edu/handle/20.500.14332/2089>.

Package details

AuthorVasco V. Branco [cre, aut] (ORCID: <https://orcid.org/0000-0001-7797-3183>), Vaughn Shirey [ctb] (ORCID: <https://orcid.org/0000-0002-3589-9699>), Thomas Merrien [ctb] (ORCID: <https://orcid.org/0000-0002-0339-5656>), Pedro Cardoso [aut] (ORCID: <https://orcid.org/0000-0001-8119-9960>)
MaintainerVasco V. Branco <vasco.branco@helsinki.fi>
LicenseGPL-3
Version0.1
Package repositoryView on CRAN
Installation Install the latest version of this package by entering the following in R:
install.packages("arete")

Try the arete package in your browser

Any scripts or data that you put into this service are public.

arete documentation built on Nov. 5, 2025, 6:31 p.m.