| llamaR-package | R Documentation |
Provides 'R' bindings to 'llama.cpp' for running Large Language Models ('LLMs') locally with optional 'Vulkan' GPU acceleration via 'ggmlR'. Supports model loading, text generation, 'tokenization', token-to-piece conversion, 'embeddings' (single and batch), encoder-decoder inference, low-level batch management, chat templates, 'LoRA' adapters, explicit backend/device selection, multi-GPU split, and 'NUMA' optimization. Includes a high-level 'ragnar'-compatible embedding provider ('embed_llamar'). Built on top of 'ggmlR' for efficient tensor operations.
Maintainer: Yuri Baramykov lbsbmsu@mail.ru (ORCID)
Other contributors:
Georgi Gerganov (Author of the 'llama.cpp' library included in src/) [copyright holder]
Useful links:
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.