bedrockagentcore_evaluate: Performs on-demand evaluation of agent traces using a...
In paws.machine.learning: 'Amazon Web Services' Machine Learning Services

View source: R/bedrockagentcore_operations.R

bedrockagentcore_evaluate

R Documentation

Performs on-demand evaluation of agent traces using a specified evaluator

Description

Performs on-demand evaluation of agent traces using a specified evaluator. This synchronous API accepts traces in OpenTelemetry format and returns immediate scoring results with detailed explanations.

See https://www.paws-r-sdk.com/docs/bedrockagentcore_evaluate/ for full documentation.

Usage

bedrockagentcore_evaluate(
  evaluatorId,
  evaluationInput,
  evaluationTarget = NULL,
  evaluationReferenceInputs = NULL
)

Arguments

`evaluatorId`	[required] The unique identifier of the evaluator to use for scoring. Can be a built-in evaluator (e.g., `Builtin.Helpfulness`, `Builtin.Correctness`) or a custom evaluator Id created through the control plane API.
`evaluationInput`	[required] The input data containing agent session spans to be evaluated. Includes a list of spans in OpenTelemetry format from supported frameworks like Strands (AgentCore Runtime) or LangGraph with OpenInference instrumentation.
`evaluationTarget`	The specific trace or span IDs to evaluate within the provided input. Allows targeting evaluation at different levels: individual tool calls, single request-response interactions (traces), or entire conversation sessions.
`evaluationReferenceInputs`	Ground truth data to compare against agent responses during evaluation. Allows to provide expected responses, assertions, and expected tool trajectories at different evaluation levels. Session-level reference inputs apply to the entire conversation, while trace-level reference inputs target specific request-response interactions identified by trace ID.