llamaR: Interface for Large Language Models via 'llama.cpp'

Provides 'R' bindings to 'llama.cpp' for running Large Language Models ('LLMs') locally with optional 'Vulkan' GPU acceleration via 'ggmlR'. Supports model loading, text generation, 'tokenization', token-to-piece conversion, 'embeddings' (single and batch), encoder-decoder inference, low-level batch management, chat templates, 'LoRA' adapters, explicit backend/device selection, multi-GPU split, and 'NUMA' optimization. Includes a high-level 'ragnar'-compatible embedding provider ('embed_llamar'). Built on top of 'ggmlR' for efficient tensor operations.

Package details

AuthorYuri Baramykov [aut, cre] (ORCID: <https://orcid.org/0009-0000-7627-4217>), Georgi Gerganov [cph] (Author of the 'llama.cpp' library included in src/)
MaintainerYuri Baramykov <lbsbmsu@mail.ru>
LicenseMIT + file LICENSE
Version0.2.4
URL https://github.com/Zabis13/llamaR
Package repositoryView on CRAN
Installation Install the latest version of this package by entering the following in R:
install.packages("llamaR")

Try the llamaR package in your browser

Any scripts or data that you put into this service are public.

llamaR documentation built on May 28, 2026, 1:06 a.m.