llama_load_model: Load a GGUF model file

View source: R/llama.R

llama_load_modelR Documentation

Load a GGUF model file

Description

Load a GGUF model file

Usage

llama_load_model(
  path,
  n_gpu_layers = -1L,
  devices = NULL,
  split_mode = "layer",
  use_mmap = TRUE,
  use_mlock = FALSE
)

Arguments

path

Path to the .gguf model file

n_gpu_layers

Number of layers to offload to GPU (-1L = all, 0L = CPU only). Default -1L offloads everything to the GPU when one is detected; if no GPU backend is available, falls back to CPU with a warning.

devices

Character vector of device names or types to use for offloading. NULL (default) uses all available devices. Use "cpu" for CPU-only, "gpu" for first GPU, or specific device names from llama_backend_devices. Multiple devices enable multi-GPU split.

split_mode

Multi-GPU split strategy: "none" (single GPU), "layer" (split layers across GPUs, default), or "row" (tensor-parallel across GPUs).

use_mmap

Logical; map model file into memory (default TRUE).

use_mlock

Logical; force the OS to keep model pages resident (default FALSE).

Value

An external pointer (class externalptr) wrapping the loaded model. This handle is required by llama_new_context, llama_model_info, and other model-level functions. Freed automatically by the garbage collector or manually via llama_free_model.

Examples

## Not run: 
# Default: full GPU offload (falls back to CPU if no GPU)
model <- llama_load_model("model.gguf")

# Force CPU-only
model <- llama_load_model("model.gguf", n_gpu_layers = 0L)

# Explicit CPU-only backend
model <- llama_load_model("model.gguf", devices = "cpu")

# Specific GPU device (see llama_backend_devices())
model <- llama_load_model("model.gguf", n_gpu_layers = -1L, devices = "Vulkan0")

# Multi-GPU: use two devices
model <- llama_load_model("model.gguf", n_gpu_layers = -1L,
                          devices = c("Vulkan0", "Vulkan1"))

## End(Not run)

llamaR documentation built on May 28, 2026, 1:06 a.m.