| llama_embed_batch | R Documentation |
Computes embeddings for a character vector of texts in a single decode pass
using per-sequence pooling. This is more efficient than calling
llama_embeddings in a loop when embedding many texts.
llama_embed_batch(ctx, texts)
ctx |
Context handle returned by [llama_new_context] |
texts |
Character vector of texts to embed |
Requires a model that supports pooled embeddings (e.g. embedding models like nomic-embed, bge, etc.). The context must have enough capacity for the total number of tokens across all texts. Causal attention is automatically disabled during computation.
A numeric matrix with nrow = length(texts) and
ncol = n_embd.
## Not run:
model <- llama_load_model("embedding-model.gguf")
ctx <- llama_new_context(model, n_ctx = 2048L)
llama_set_causal_attn(ctx, FALSE)
mat <- llama_embed_batch(ctx, c("hello world", "foo bar", "test"))
# mat is a 3 x n_embd matrix
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.