llama_perf: Get performance statistics
In llamaR: Interface for Large Language Models via 'llama.cpp'

llama_perf

R Documentation

Get performance statistics

Description

Returns timing and count statistics for the current context, including prompt processing time, token generation time, and counts.

Usage

llama_perf(ctx)

Arguments

ctx

Context handle returned by [llama_new_context]

Value

A named list with fields: - 't_load_ms': model load time in milliseconds - 't_p_eval_ms': prompt processing time in milliseconds - 't_eval_ms': token generation time in milliseconds - 'n_p_eval': number of prompt tokens processed - 'n_eval': number of tokens generated - 'n_reused': number of reused compute graphs

Examples

## Not run: 
result <- llama_generate(ctx, "Hello world")
perf <- llama_perf(ctx)
cat("Prompt speed:", perf$n_p_eval / (perf$t_p_eval_ms / 1000), "tok/s\n")
cat("Generation speed:", perf$n_eval / (perf$t_eval_ms / 1000), "tok/s\n")

## End(Not run)

llamaR documentation built on May 28, 2026, 1:06 a.m.