knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = identical(tolower(Sys.getenv("LLMR_RUN_VIGNETTES", "false")), "true") )
OpenAI-compatible (OpenAI, Groq, Together, x.ai, DeepSeek)
Chat Completions accept a response_format
(e.g., {"type":"json_object"}
or a JSON-Schema payload). Enforcement varies by provider but the interface is OpenAI-shaped.
See OpenAI API overview, Groq API (OpenAI-compatible), Together: OpenAI compatibility, x.ai: OpenAI API schema, DeepSeek: OpenAI-compatible endpoint
Anthropic (Claude)
No global “JSON mode.” Instead, you define a tool with an input_schema
(JSON Schema) and force it via tool_choice
, so the model must return a JSON object that validates the schema.
See Anthropic Messages API: tools & input_schema
Google Gemini (REST)
Set responseMimeType = "application/json"
in generationConfig
to request JSON. Some models also accept responseSchema
for constrained JSON (model-dependent).
See Gemini documentation
llm_parse_structured()
strips fences and extracts the largest balanced {...}
or [...]
before parsing. llm_parse_structured_col()
hoists fields (supports dot/bracket paths and JSON Pointer) and keeps non-scalars as list-columns. llm_validate_structured_col()
validates locally via jsonvalidate (AJV). enable_structured_output()
flips the right provider switch (OpenAI-compat response_format
, Anthropic tool + input_schema
, Gemini responseMimeType
/responseSchema
).All chunks use a tiny helper so your document knits even without API keys.
safe <- function(expr) tryCatch(expr, error = function(e) {message("ERROR: ", e$message); NULL})
safe({ library(LLMR) cfg <- llm_config( provider = "openai", # try "groq" or "together" too model = "gpt-4o-mini", temperature = 0 ) # Flip JSON mode on (OpenAI-compat shape) cfg_json <- enable_structured_output(cfg, schema = NULL) res <- call_llm(cfg_json, 'Give me a JSON object {"ok": true, "n": 3}.') parsed <- llm_parse_structured(res) cat("Raw text:\n", as.character(res), "\n\n") str(parsed) })
What could still fail? Proxies labeled “OpenAI-compatible” sometimes accept response_format
but don’t strictly enforce it; LLMR’s parser recovers from fences or pre/post text.
Groq serves Qwen 2.5 Instruct models with OpenAI-compatible APIs. Their Structured Outputs feature enforces JSON Schema and (notably) expects all properties to be listed under required
.
safe({ library(LLMR); library(dplyr) # Schema: make every property required to satisfy Groq's stricter check schema <- list( type = "object", additionalProperties = FALSE, properties = list( title = list(type = "string"), year = list(type = "integer"), tags = list(type = "array", items = list(type = "string")) ), required = list("title","year","tags") ) cfg <- llm_config( provider = "groq", model = "qwen-2.5-72b-instruct", # a Qwen Instruct model on Groq temperature = 0 ) cfg_strict <- enable_structured_output(cfg, schema = schema, strict = TRUE) df <- tibble(x = c("BERT paper", "Vision Transformers")) out <- llm_fn_structured( df, prompt = "Return JSON about '{x}' with fields title, year, tags.", .config = cfg_strict, .schema = schema, # send schema to provider .fields = c("title","year","tags"), .validate_local = TRUE ) out %>% select(structured_ok, structured_valid, title, year, tags) %>% print(n = Inf) })
If your key is set, you should see structured_ok = TRUE
, structured_valid = TRUE
, plus parsed columns.
(Tip: if you see a 400 complaining about required
, add all properties to required
, as above.)
max_tokens
)safe({ library(LLMR) schema <- list( type="object", properties=list(answer=list(type="string"), confidence=list(type="number")), required=list("answer","confidence"), additionalProperties=FALSE ) cfg <- llm_config("anthropic","claude-3-7", temperature = 0) cfg <- enable_structured_output(cfg, schema = schema, name = "llmr_schema") res <- call_llm(cfg, c( system = "Return only the tool result that matches the schema.", user = "Answer: capital of Japan; include confidence in [0,1]." )) parsed <- llm_parse_structured(res) str(parsed) })
Anthropic requires
max_tokens
; LLMR warns and defaults if you omit it.
safe({ library(LLMR) cfg <- llm_config( "gemini", "gemini-2.0-flash", response_mime_type = "application/json" # ask for JSON back # Optionally: gemini_enable_response_schema = TRUE, response_schema = <your JSON Schema> ) res <- call_llm(cfg, c( system = "Reply as JSON only.", user = "Produce fields name and score about 'MNIST'." )) str(llm_parse_structured(res)) })
safe({ library(LLMR); library(tibble) messy <- c( '```json\n{"x": 1, "y": [1,2,3]}\n```', 'Sure! Here is JSON: {"x":"1","y":"oops"} trailing words', '{"x":1, "y":[2,3,4]}' ) tibble(response_text = messy) |> llm_parse_structured_col( fields = c(x = "x", y = "/y/0") # dot/bracket or JSON Pointer ) |> print(n = Inf) })
Why this helps
Works when outputs arrive fenced, with pre/post text, or when arrays sneak in. Non-scalars become list-columns (set allow_list = FALSE
to force scalars only).
enable_structured_output()
and run llm_parse_structured()
+ local validation.input_schema
: https://docs.anthropic.com/en/api/messages#body-tool-choiceAny scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.