| llm_mutate | R Documentation |
Adds one or more columns to .data that are produced by a Large-Language-Model.
llm_mutate(
.data,
output,
prompt = NULL,
.messages = NULL,
.config,
.system_prompt = NULL,
.before = NULL,
.after = NULL,
.return = c("columns", "text", "object"),
.structured = FALSE,
.schema = NULL,
.fields = NULL,
.tags = NULL,
...
)
.data |
A data.frame / tibble. |
output |
Unquoted name that becomes the new column (generative) or
the prefix for embedding columns. In shorthand form, omit this argument
and pass |
prompt |
Optional glue template string for a single user turn; reference
any columns in |
.messages |
Optional named character vector of glue templates to build
a multi-turn message, using roles in |
.config |
An llm_config object (generative or embedding). |
.system_prompt |
Optional system message sent with every request when
|
.before, .after |
Standard dplyr::relocate helpers controlling where the generated column(s) are placed. |
.return |
One of |
.structured |
Logical. If |
.schema |
Optional JSON Schema (R list). When |
.fields |
Optional character vector of fields to extract from parsed JSON
or tag output. In JSON mode, supports nested paths (e.g., |
.tags |
Optional character vector of XML-like tag names to request and parse,
such as |
... |
Passed to the underlying calls: |
Multi-column injection: templating is NA-safe (NA -> empty string).
Multi-turn templating: supply .messages = c(system=..., user=..., file=...).
Duplicate role names are allowed (e.g., two user turns).
Generative mode: one request per row via call_llm_broadcast().
Parallelism: calls call_llm_broadcast(), which uses
call_llm_robust() under the hood. If no future plan is active,
workers are auto-configured; call setup_llm_parallel() to set worker
count explicitly.
Embedding mode: the per-row text is embedded via get_batched_embeddings().
Result expands to numeric columns named paste0(<output>, 1:N). If all rows
fail to embed, a single <output>1 column of NA is returned.
Diagnostic columns use suffixes: _finish, _sent, _rec, _tot, _reason, _ok, _err, _id, _status, _ecode, _param, _t.
.data with the new column(s) appended.
You can supply the output column and prompt in one argument:
df |> llm_mutate(answer = "{question} (hint: {hint})", .config = cfg)
df |> llm_mutate(answer = c(system = "One word.", user = "{question}"), .config = cfg)
df |> llm_mutate(country = "Where is {city}? Answer with only the country.", .config = cfg)
This is equivalent to:
df |> llm_mutate(answer, prompt = "{question} (hint: {hint})", .config = cfg)
df |> llm_mutate(answer, .messages = c(system = "One word.", user = "{question}"), .config = cfg)
.structured = TRUE delegates to llm_mutate_structured() for JSON.
.tags delegates to llm_mutate_tags() for XML-like tags.
If both are supplied, .structured takes precedence.
llm_fn(), llm_mutate_structured(), llm_mutate_tags(),
llm_parse_structured_col(), llm_parse_tags_col(),
call_llm_broadcast(), setup_llm_parallel()
## Not run:
library(dplyr)
df <- tibble::tibble(
id = 1:2,
question = c("Capital of France?", "Author of 1984?"),
hint = c("European city", "English novelist")
)
cfg <- llm_config("openai", "gpt-4.1-nano",
temperature = 0)
# Generative: single-turn with multi-column injection
df |>
llm_mutate(
answer,
prompt = "{question} (hint: {hint})",
.config = cfg,
.system_prompt = "Respond in one word."
)
# Generative: multi-turn via .messages (system + user)
df |>
llm_mutate(
advice,
.messages = c(
system = "You are a helpful zoologist. Keep answers short.",
user = "What is a key fact about this? {question} (hint: {hint})"
),
.config = cfg
)
# Multimodal: include an image path with role 'file'
pics <- tibble::tibble(
img = c("inst/extdata/cat.png", "inst/extdata/dog.jpg"),
prompt = c("Describe the image.", "Describe the image.")
)
pics |>
llm_mutate(
vision_desc,
.messages = c(user = "{prompt}", file = "{img}"),
.config = llm_config("openai","gpt-4.1-mini")
)
# Embeddings: output name becomes the prefix of embedding columns
emb_cfg <- llm_config("voyage", "voyage-3.5-lite",
embedding = TRUE)
df |>
llm_mutate(
vec,
prompt = "{question}",
.config = emb_cfg,
.after = id
)
# Structured output: using .structured = TRUE (equivalent to llm_mutate_structured)
schema <- list(
type = "object",
properties = list(
answer = list(type = "string"),
confidence = list(type = "number")
),
required = list("answer", "confidence")
)
df |>
llm_mutate(
result,
prompt = "{question}",
.config = cfg,
.structured = TRUE,
.schema = schema
)
# Structured with shorthand
df |>
llm_mutate(
result = "{question}",
.config = cfg,
.structured = TRUE,
.schema = schema
)
# Soft structured output with XML-like tags
df |>
llm_mutate(
result = "Extract the person's age and job from: {question}",
.config = cfg,
.tags = c("age", "job")
)
cities <- tibble::tibble(city = c("Cairo", "Lima"))
cities |>
llm_mutate(
geo = "Where is {city}? Give country and continent in their own tags.",
.config = cfg,
.system_prompt = paste(
"Use XML tags for different parts of the answer, but do not nest tags.",
"Return <country>...</country> and <continent>...</continent>."
),
.tags = c("country", "continent")
)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.