| call_llm | R Documentation |
call_llm() dispatches to the correct provider implementation based on
config$provider. It supports both generative chat/completions and
embeddings, plus a simple multimodal shortcut for local files.
call_llm(config, messages, verbose = FALSE)
## S3 method for class 'ollama'
call_llm(config, messages, verbose = FALSE)
config |
An |
messages |
One of:
|
verbose |
Logical. If |
Generative mode: an llmr_response object. Use as.character(x) to get just the text; print(x) shows text plus a status line; use helpers finish_reason(x) and tokens(x).
Embedding mode: provider-native list with an element data; convert with parse_embeddings().
OpenAI-compatible: On a server 400 that identifies the bad
parameter as max_tokens, LLMR will, unless no_change=TRUE,
retry once replacing max_tokens with max_completion_tokens
(and inform via a cli_alert_info). The former experimental
"uncapped retry on empty content" is disabled by default to
avoid unexpected costs.
Anthropic: max_tokens is required; if omitted LLMR uses
2048 and warns. Multimodal images are inlined as base64. Extended
thinking is supported: provide thinking_budget and
include_thoughts = TRUE to include a content block of type
"thinking" in the response; LLMR sets the beta header automatically.
Gemini (REST): systemInstruction is supported; user
parts use text/inlineData(mimeType,data); responses are set to
responseMimeType = "text/plain".
Ollama (local): OpenAI-compatible endpoints on http://localhost:11434/v1/*;
no Authorization header is required. Override with api_url as needed.
Error handling: HTTP errors raise structured conditions with
classes like llmr_api_param_error, llmr_api_rate_limit_error,
llmr_api_server_error; see the condition fields for status, code,
request id, and (where supplied) the offending parameter.
See the "multimodal shortcut" described under messages. Internally,
LLMR expands these into the provider's native request shape and tilde-expands
local file paths.
Ollama provides an OpenAI-compatible HTTP API on localhost by default. Start the
daemon and pull a model first (terminal): ollama serve (in background) and
ollama pull llama3. Then configure LLMR with
llm_config("ollama", "llama3", embedding = FALSE) for chat or
llm_config("ollama", "nomic-embed-text", embedding = TRUE) for embeddings.
Override the endpoint with api_url if not using the default
http://localhost:11434/v1/*.
llm_config,
call_llm_robust,
llm_chat_session,
parse_embeddings,
finish_reason,
tokens
## Not run:
## 1) Basic generative call
cfg <- llm_config("openai", "gpt-4o-mini")
call_llm(cfg, "Say hello in Greek.")
## 2) Generative with rich return
r <- call_llm(cfg, "Say hello in Greek.")
r
as.character(r)
finish_reason(r); tokens(r)
## 3) Anthropic extended thinking (single example)
a_cfg <- llm_config("anthropic", "claude-sonnet-4-20250514",
max_tokens = 5000,
thinking_budget = 16000,
include_thoughts = TRUE)
r2 <- call_llm(a_cfg, "Compute 87*93 in your head. Give only the final number.")
# thinking (if present): r2$raw$content[[1]]$thinking
# final text: r2$raw$content[[2]]$text
## 4) Multimodal (named-vector shortcut)
msg <- c(
system = "Answer briefly.",
user = "Describe this image in one sentence.",
file = "~/Pictures/example.png"
)
call_llm(cfg, msg)
## 5) Embeddings
e_cfg <- llm_config("voyage", "voyage-large-2",
embedding = TRUE)
emb_raw <- call_llm(e_cfg, c("first", "second"))
emb_mat <- parse_embeddings(emb_raw)
## 6) With a chat session
ch <- chat_session(cfg)
ch$send("Say hello in Greek.") # prints the same status line as `print.llmr_response`
ch$history()
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.