Man pages for llamaR
Interface for Large Language Models via 'llama.cpp'

chat_llamarChat with a local model through an ellmer::Chat object
chat_llamar_stopStop the server spawned by chat_llamar()
embed_llamarEmbedding provider for ragnar / standalone use
llama_backend_devicesList available backend devices
llama_batch_freeFree a llama batch allocated with 'llama_batch_init()'
llama_batch_initInitialise a llama batch
llama_chat_apply_templateApply chat template to messages
llama_chat_builtin_templatesList built-in chat templates
llama_chat_templateGet model's built-in chat template
llama_detokenizeDetokenize token IDs back to text
llama_embed_batchBatch embeddings for multiple texts
llama_embeddingsExtract embeddings for a text
llama_encodeEncode tokens using the encoder (encoder-decoder models only)
llama_free_contextFree an inference context
llama_free_modelFree a loaded model
llama_gen_beginBegin a streaming (token-by-token) generation
llama_gen_endFinish a streaming generation
llama_generateGenerate text from a prompt
llama_generate_batchGenerate completions for multiple prompts in parallel
llama_gen_nextPull the next chunk of a streaming generation
llama_get_embeddingsGet all output token embeddings as a matrix
llama_get_embeddings_ithGet embeddings for the i-th token in the batch
llama_get_embeddings_seqGet pooled embeddings for a sequence
llama_get_logitsGet logits from the last decode step
llama_get_logits_ithGet logits for a specific token position
llama_get_modelGet the model associated with a context
llama_get_verbosityGet current verbosity level
llama_hf_cache_clearClear the model cache
llama_hf_cache_dirGet the cache directory for downloaded models
llama_hf_cache_infoShow information about the model cache
llama_hf_downloadDownload a GGUF model from Hugging Face
llama_hf_listList GGUF files in a Hugging Face repository
llama_load_modelLoad a GGUF model file
llama_load_model_hfLoad a model directly from Hugging Face
llama_lora_applyApply a LoRA adapter to context
llama_lora_clearRemove all LoRA adapters from context
llama_lora_loadLoad a LoRA adapter
llama_lora_removeRemove a LoRA adapter from context
llama_max_devicesGet maximum number of devices
llama_memory_breakdown_printPrint memory breakdown by device
llama_memory_can_shiftCheck if the KV cache supports shifting
llama_memory_clearClear the KV cache
llama_memory_seq_addShift token positions in a sequence
llama_memory_seq_cpCopy a sequence in the KV cache
llama_memory_seq_divInteger-divide token positions in a sequence
llama_memory_seq_keepKeep only one sequence in the KV cache
llama_memory_seq_pos_rangeGet position range for a sequence
llama_memory_seq_rmRemove tokens from a sequence in the KV cache
llama_model_infoGet model metadata
llama_model_metaGet all model metadata as a named character vector
llama_model_meta_valGet a single model metadata value by key
llama_n_batchGet logical batch size
llama_n_ctxGet context window size
llama_n_ctx_seqGet per-sequence context window size
llama_new_contextCreate an inference context
llama_n_seq_maxGet maximum number of sequences
llama_n_threadsGet number of threads for single-token generation
llama_n_threads_batchGet number of threads for batch processing
llama_n_ubatchGet physical micro-batch size
llama_numa_initInitialize NUMA optimization
llama_perfGet performance statistics
llama_perf_printPrint performance statistics to the console
llama_perf_resetReset performance counters
llama_pooling_typeGet pooling type
llamaR-packagellamaR: Interface for Large Language Models via 'llama.cpp'
llama_serve_openaiServe an OpenAI-compatible HTTP API for a local model
llama_set_abort_callbackSet or clear the abort callback
llama_set_causal_attnSet causal attention mode
llama_set_threadsSet the number of threads for a context
llama_set_verbositySet logging verbosity level
llama_set_warmupSet warmup mode
llama_state_get_sizeGet the size of the serialized context state in bytes
llama_state_loadLoad context state from file
llama_state_saveSave context state to file
llama_supports_gpuCheck whether GPU offloading is available
llama_supports_mlockCheck whether memory locking is supported
llama_supports_mmapCheck whether memory-mapped file I/O is supported
llama_supports_rpcCheck whether RPC backend is available
llama_synchronizeSynchronize asynchronous computation
llama_system_infoGet system information string
llama_time_usGet current time in microseconds
llama_tokenizeTokenize text into token IDs
llama_token_to_pieceConvert a single token ID to its text piece
llama_vocab_get_scoreGet the score of a token
llama_vocab_get_textGet the text representation of a token
llama_vocab_infoGet vocabulary special token IDs
llama_vocab_is_controlCheck if a token is a control token
llama_vocab_is_eogCheck if a token is an end-of-generation token
llama_vocab_typeGet vocabulary type
llamaR documentation built on May 28, 2026, 1:06 a.m.