| llama_perf | R Documentation |
Returns timing and count statistics for the current context, including prompt processing time, token generation time, and counts.
llama_perf(ctx)
ctx |
Context handle returned by [llama_new_context] |
A named list with fields: - 't_load_ms': model load time in milliseconds - 't_p_eval_ms': prompt processing time in milliseconds - 't_eval_ms': token generation time in milliseconds - 'n_p_eval': number of prompt tokens processed - 'n_eval': number of tokens generated - 'n_reused': number of reused compute graphs
## Not run:
result <- llama_generate(ctx, "Hello world")
perf <- llama_perf(ctx)
cat("Prompt speed:", perf$n_p_eval / (perf$t_p_eval_ms / 1000), "tok/s\n")
cat("Generation speed:", perf$n_eval / (perf$t_eval_ms / 1000), "tok/s\n")
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.