| llama_load_model | R Documentation |
Load a GGUF model file
llama_load_model(
path,
n_gpu_layers = -1L,
devices = NULL,
split_mode = "layer",
use_mmap = TRUE,
use_mlock = FALSE
)
path |
Path to the .gguf model file |
n_gpu_layers |
Number of layers to offload to GPU
( |
devices |
Character vector of device names or types to use for offloading.
|
split_mode |
Multi-GPU split strategy: |
use_mmap |
Logical; map model file into memory (default |
use_mlock |
Logical; force the OS to keep model pages resident
(default |
An external pointer (class externalptr) wrapping the loaded
model. This handle is required by llama_new_context,
llama_model_info, and other model-level functions.
Freed automatically by the garbage collector or manually via
llama_free_model.
## Not run:
# Default: full GPU offload (falls back to CPU if no GPU)
model <- llama_load_model("model.gguf")
# Force CPU-only
model <- llama_load_model("model.gguf", n_gpu_layers = 0L)
# Explicit CPU-only backend
model <- llama_load_model("model.gguf", devices = "cpu")
# Specific GPU device (see llama_backend_devices())
model <- llama_load_model("model.gguf", n_gpu_layers = -1L, devices = "Vulkan0")
# Multi-GPU: use two devices
model <- llama_load_model("model.gguf", n_gpu_layers = -1L,
devices = c("Vulkan0", "Vulkan1"))
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.