llama_embed_batch: Batch embeddings for multiple texts
In llamaR: Interface for Large Language Models via 'llama.cpp'

llama_embed_batch

R Documentation

Batch embeddings for multiple texts

Description

Computes embeddings for a character vector of texts in a single decode pass using per-sequence pooling. This is more efficient than calling llama_embeddings in a loop when embedding many texts.

Usage

llama_embed_batch(ctx, texts)

Arguments

`ctx`	Context handle returned by [llama_new_context]
`texts`	Character vector of texts to embed

Details

Requires a model that supports pooled embeddings (e.g. embedding models like nomic-embed, bge, etc.). The context must have enough capacity for the total number of tokens across all texts. Causal attention is automatically disabled during computation.

Value

A numeric matrix with nrow = length(texts) and ncol = n_embd.

Examples

## Not run: 
model <- llama_load_model("embedding-model.gguf")
ctx <- llama_new_context(model, n_ctx = 2048L)
llama_set_causal_attn(ctx, FALSE)

mat <- llama_embed_batch(ctx, c("hello world", "foo bar", "test"))
# mat is a 3 x n_embd matrix

## End(Not run)

llamaR documentation built on May 28, 2026, 1:06 a.m.