llm_mutate: Mutate a data frame with LLM output

View source: R/LLMR_tidy.R

llm_mutateR Documentation

Mutate a data frame with LLM output

Description

Adds one or more columns to .data that are produced by a Large-Language-Model.

Usage

llm_mutate(
  .data,
  output,
  prompt = NULL,
  .messages = NULL,
  .config,
  .system_prompt = NULL,
  .before = NULL,
  .after = NULL,
  .return = c("columns", "text", "object"),
  .structured = FALSE,
  .schema = NULL,
  .fields = NULL,
  .tags = NULL,
  ...
)

Arguments

.data

A data.frame / tibble.

output

Unquoted name that becomes the new column (generative) or the prefix for embedding columns. In shorthand form, omit this argument and pass newcol = "<glue prompt>" or newcol = c(system = "...", user = "...") through ....

prompt

Optional glue template string for a single user turn; reference any columns in .data (e.g. "{id}. {question}\nContext: {context}"). Ignored if .messages is supplied.

.messages

Optional named character vector of glue templates to build a multi-turn message, using roles in c("system","user","assistant","file"). Values are glue templates evaluated per-row; all can reference multiple columns. For multimodal, use role "file" with a column containing a path template.

.config

An llm_config object (generative or embedding).

.system_prompt

Optional system message sent with every request when .messages does not include a system entry.

.before, .after

Standard dplyr::relocate helpers controlling where the generated column(s) are placed.

.return

One of c("columns","text","object"). For generative mode, controls how results are added. "columns" (default) adds text plus diagnostic columns; "text" adds a single text column; "object" adds a list-column of llmr_response objects.

.structured

Logical. If TRUE, enables structured JSON output with automatic parsing. When enabled, this is equivalent to calling llm_mutate_structured(). Default is FALSE.

.schema

Optional JSON Schema (R list). When .structured = TRUE, this schema is sent to the provider for validation and used for local parsing. When NULL, only JSON mode is enabled (no strict schema validation).

.fields

Optional character vector of fields to extract from parsed JSON or tag output. In JSON mode, supports nested paths (e.g., "user.name" or "/data/items/0"). When NULL and .schema is provided, auto-extracts all top-level schema properties. In tag mode, NULL extracts all .tags. Set to FALSE to skip field extraction entirely.

.tags

Optional character vector of XML-like tag names to request and parse, such as c("age", "job"). When supplied, llm_mutate() delegates to llm_mutate_tags() and adds tags_ok, tags_data, and one column per tag unless .fields = FALSE.

...

Passed to the underlying calls: call_llm_broadcast() in generative mode, get_batched_embeddings() in embedding mode.

Details

  • Multi-column injection: templating is NA-safe (NA -> empty string).

  • Multi-turn templating: supply .messages = c(system=..., user=..., file=...). Duplicate role names are allowed (e.g., two user turns).

  • Generative mode: one request per row via call_llm_broadcast().

  • Parallelism: calls call_llm_broadcast(), which uses call_llm_robust() under the hood. If no future plan is active, workers are auto-configured; call setup_llm_parallel() to set worker count explicitly.

  • Embedding mode: the per-row text is embedded via get_batched_embeddings(). Result expands to numeric columns named ⁠paste0(<output>, 1:N)⁠. If all rows fail to embed, a single ⁠<output>1⁠ column of NA is returned.

  • Diagnostic columns use suffixes: ⁠_finish⁠, ⁠_sent⁠, ⁠_rec⁠, ⁠_tot⁠, ⁠_reason⁠, ⁠_ok⁠, ⁠_err⁠, ⁠_id⁠, ⁠_status⁠, ⁠_ecode⁠, ⁠_param⁠, ⁠_t⁠.

Value

.data with the new column(s) appended.

Shorthand

You can supply the output column and prompt in one argument:

df |> llm_mutate(answer = "{question} (hint: {hint})", .config = cfg)
df |> llm_mutate(answer = c(system = "One word.", user = "{question}"), .config = cfg)
df |> llm_mutate(country = "Where is {city}? Answer with only the country.", .config = cfg)

This is equivalent to:

df |> llm_mutate(answer, prompt = "{question} (hint: {hint})", .config = cfg)
df |> llm_mutate(answer, .messages = c(system = "One word.", user = "{question}"), .config = cfg)

Structured modes

  • .structured = TRUE delegates to llm_mutate_structured() for JSON.

  • .tags delegates to llm_mutate_tags() for XML-like tags. If both are supplied, .structured takes precedence.

See Also

llm_fn(), llm_mutate_structured(), llm_mutate_tags(), llm_parse_structured_col(), llm_parse_tags_col(), call_llm_broadcast(), setup_llm_parallel()

Examples

## Not run: 
library(dplyr)

df <- tibble::tibble(
  id       = 1:2,
  question = c("Capital of France?", "Author of 1984?"),
  hint     = c("European city", "English novelist")
)

cfg <- llm_config("openai", "gpt-4.1-nano",
                  temperature = 0)

# Generative: single-turn with multi-column injection
df |>
  llm_mutate(
    answer,
    prompt = "{question} (hint: {hint})",
    .config = cfg,
    .system_prompt = "Respond in one word."
  )

# Generative: multi-turn via .messages (system + user)
df |>
  llm_mutate(
    advice,
    .messages = c(
      system = "You are a helpful zoologist. Keep answers short.",
      user   = "What is a key fact about this? {question} (hint: {hint})"
    ),
    .config = cfg
  )

# Multimodal: include an image path with role 'file'
pics <- tibble::tibble(
  img    = c("inst/extdata/cat.png", "inst/extdata/dog.jpg"),
  prompt = c("Describe the image.", "Describe the image.")
)
pics |>
  llm_mutate(
    vision_desc,
    .messages = c(user = "{prompt}", file = "{img}"),
    .config = llm_config("openai","gpt-4.1-mini")
  )

# Embeddings: output name becomes the prefix of embedding columns
emb_cfg <- llm_config("voyage", "voyage-3.5-lite",
                      embedding = TRUE)
df |>
  llm_mutate(
    vec,
    prompt  = "{question}",
    .config = emb_cfg,
    .after  = id
  )

# Structured output: using .structured = TRUE (equivalent to llm_mutate_structured)
schema <- list(
  type = "object",
  properties = list(
    answer = list(type = "string"),
    confidence = list(type = "number")
  ),
  required = list("answer", "confidence")
)

df |>
  llm_mutate(
    result,
    prompt = "{question}",
    .config = cfg,
    .structured = TRUE,
    .schema = schema
  )

# Structured with shorthand
df |>
  llm_mutate(
    result = "{question}",
    .config = cfg,
    .structured = TRUE,
    .schema = schema
  )

# Soft structured output with XML-like tags
df |>
  llm_mutate(
    result = "Extract the person's age and job from: {question}",
    .config = cfg,
    .tags = c("age", "job")
  )

cities <- tibble::tibble(city = c("Cairo", "Lima"))
cities |>
  llm_mutate(
    geo = "Where is {city}? Give country and continent in their own tags.",
    .config = cfg,
    .system_prompt = paste(
      "Use XML tags for different parts of the answer, but do not nest tags.",
      "Return <country>...</country> and <continent>...</continent>."
    ),
    .tags = c("country", "continent")
  )

## End(Not run)

LLMR documentation built on May 22, 2026, 1:07 a.m.