llm_mutate: Mutate a data frame with LLM output
In LLMR: Interface for Large Language Model APIs in R

llm_mutate

R Documentation

Mutate a data frame with LLM output

Description

Adds one or more columns to .data that are produced by a Large-Language-Model.

Usage

llm_mutate(
  .data,
  output,
  prompt = NULL,
  .messages = NULL,
  .config,
  .system_prompt = NULL,
  .before = NULL,
  .after = NULL,
  .return = c("columns", "text", "object"),
  ...
)

Arguments

`.data`	A data.frame / tibble.
`output`	Unquoted name that becomes the new column (generative) or the prefix for embedding columns.
`prompt`	Optional glue template string for a single user turn; reference any columns in `.data` (e.g. `"{id}. {question}\nContext: {context}"`). Ignored if `.messages` is supplied.
`.messages`	Optional named character vector of glue templates to build a multi-turn message, using roles in `c("system","user","assistant","file")`. Values are glue templates evaluated per-row; all can reference multiple columns. For multimodal, use role `"file"` with a column containing a path template.
`.config`	An llm_config object (generative or embedding).
`.system_prompt`	Optional system message sent with every request when `.messages` does not include a `system` entry.
`.before`, `.after`	Standard dplyr::relocate helpers controlling where the generated column(s) are placed.
`.return`	One of `c("columns","text","object")`. For generative mode, controls how results are added. `"columns"` (default) adds text plus diagnostic columns; `"text"` adds a single text column; `"object"` adds a list-column of `llmr_response` objects.
`...`	Passed to the underlying calls: `call_llm_broadcast()` in generative mode, `get_batched_embeddings()` in embedding mode.

Details

Multi-column injection: templating is NA-safe (NA -> empty string).
Multi-turn templating: supply .messages = c(system=..., user=..., file=...). Duplicate role names are allowed (e.g., two user turns).
Generative mode: one request per row via call_llm_broadcast(). Parallel execution follows the active future plan; see setup_llm_parallel().
Embedding mode: the per-row text is embedded via get_batched_embeddings(). Result expands to numeric columns named ⁠paste0(<output>, 1:N)⁠. If all rows fail to embed, a single ⁠<output>1⁠ column of NA is returned.
Diagnostic columns use suffixes: ⁠_finish⁠, ⁠_sent⁠, ⁠_rec⁠, ⁠_tot⁠, ⁠_reason⁠, ⁠_ok⁠, ⁠_err⁠, ⁠_id⁠, ⁠_status⁠, ⁠_ecode⁠, ⁠_param⁠, ⁠_t⁠.

Value

.data with the new column(s) appended.

Examples

## Not run: 
library(dplyr)

df <- tibble::tibble(
  id       = 1:2,
  question = c("Capital of France?", "Author of 1984?"),
  hint     = c("European city", "English novelist")
)

cfg <- llm_config("openai", "gpt-4o-mini",
                  temperature = 0)

# Generative: single-turn with multi-column injection
df |>
  llm_mutate(
    answer,
    prompt = "{question} (hint: {hint})",
    .config = cfg,
    .system_prompt = "Respond in one word."
  )

# Generative: multi-turn via .messages (system + user)
df |>
  llm_mutate(
    advice,
    .messages = c(
      system = "You are a helpful zoologist. Keep answers short.",
      user   = "What is a key fact about this? {question} (hint: {hint})"
    ),
    .config = cfg
  )

# Multimodal: include an image path with role 'file'
pics <- tibble::tibble(
  img    = c("inst/extdata/cat.png", "inst/extdata/dog.jpg"),
  prompt = c("Describe the image.", "Describe the image.")
)
pics |>
  llm_mutate(
    vision_desc,
    .messages = c(user = "{prompt}", file = "{img}"),
    .config = llm_config("openai","gpt-4.1-mini")
  )

# Embeddings: output name becomes the prefix of embedding columns
emb_cfg <- llm_config("voyage", "voyage-3.5-lite",
                      embedding = TRUE)
df |>
  llm_mutate(
    vec,
    prompt  = "{question}",
    .config = emb_cfg,
    .after  = id
  )

## End(Not run)

LLMR documentation built on Aug. 26, 2025, 9:08 a.m.