get_geodata: Call a Large Language Model (LLM) to extract species...

View source: R/05_get_geodata.R

get_geodataR Documentation

Call a Large Language Model (LLM) to extract species geographic data

Description

Send an API request to extract species data from a document. For now only service = "GPT" is supported but more are planned including both proprietary and open source models. Uses the API...

Usage

get_geodata(
  path,
  user_key,
  service = "GPT",
  model = "gpt-3.5",
  tax = NULL,
  outpath = NULL,
  outliers = FALSE,
  verbose = TRUE
)

Arguments

path

character. string of a file with species data in either pdf or txt format, e.g: "./folder/file.pdf"

user_key

list. Two elements, first element is a character with the user's API key, second element is a logical Bool determining whether the user's account has access to premium features. Both free keys and premium keys are allowed.

service

character. Model to be used. Right now, only requests using OpenAI's chatGPT are available.

model

character. Model name from given service to be used. You may use any of the models listed on OpenAI's developer platform. If you are unsure which model to use, we recommend picking "gpt-3.5" (default) or "gpt-4o", as these will pick our recommended model from that version.

tax

character. Binomial name of the species to specify extraction to. Most often increases performance of the model.

outpath

Character string of a path to save output to in the format "path/to/file/file_prefix".

outliers

logical. Whether or not results should be processed using the methods described in gecko::outliers.detect()

verbose

logical determining if output should be printed.

Value

matrix. Containing the extracted information.

See Also

arete_setup

Examples

## Not run: 
file_path = arete_data("holzapfelae")

get_geodata(
  path = file_path,
  user_key = list(key = "your key here", premium = TRUE),
  model = "gpt-4o",
  outpath = "./out"
)
## End(Not run)

arete documentation built on Nov. 5, 2025, 6:31 p.m.