vitals is a framework for large language model evaluation in R. It’s specifically aimed at ellmer users who want to measure the effectiveness of their LLM products like custom chat apps and querychat apps. You can use it to:
The package is an R port of the widely adopted Python framework Inspect. While the package doesn’t integrate with Inspect directly, it allows users to interface with the Inspect log viewer and provides an on-ramp to transition to Inspect if need be by writing evaluation logs to the same file format.
Install the vitals package from CRAN with:
install.packages("vitals")
You can install the developmental version of vitals using:
pak::pak("tidyverse/vitals")
LLM evaluation with vitals is composed of two main steps.
library(vitals)
library(ellmer)
library(tibble)
1) First, create an evaluation task with the Task$new()
method.
simple_addition <- tibble(
input = c("What's 2+2?", "What's 2+3?", "What's 2+4?"),
target = c("4", "5", "6")
)
tsk <- Task$new(
dataset = simple_addition,
solver = generate(chat_anthropic(model = "claude-sonnet-4-20250514")),
scorer = model_graded_qa()
)
Tasks are composed of three main components:
input
and
target
. input
represents some question or problem, and target
gives the target response.input
and return some value
approximating target
, likely wrapping ellmer chats. generate()
is
the simplest scorer in vitals, and just passes the input
to the
chat’s $chat()
method, returning its result as-is.target
, evaluating
how well the solver solved the input
.2) Evaluate the task.
tsk$eval()
$eval()
will run the solver, run the scorer, and then situate the
results in a persistent log file that can be explored interactively with
the Inspect log viewer.
Any arguments to the solver or scorer can be passed to $eval()
,
allowing for straightforward parameterization of tasks. For example, if
I wanted to evaluate chat_openai()
on this task rather than
chat_anthropic()
, I could write:
tsk_openai <- tsk$clone()
tsk_openai$eval(solver_chat = chat_openai(model = "gpt-4.1"))
For an applied example, see the “Getting started with vitals” vignette
at vignette("vitals", package = "vitals")
.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.