| chat_llamar | R Documentation |
Returns an ellmer Chat object backed by a local GGUF model,
so the whole ellmer / ragnar toolchain (turns, tools, streaming,
structured output, ragnar_register_tool_retrieve(), …) works
against local inference. Transport is the OpenAI-compatible HTTP API
from llama_serve_openai; this function is a thin
chat_vllm wrapper over it. (We use the vLLM provider
because it speaks /v1/chat/completions — the de-facto standard our
server implements — whereas ellmer's chat_openai/
chat_openai_compatible target OpenAI's newer /v1/responses.)
chat_llamar(
model_path = NULL,
base_url = NULL,
port = 11434L,
n_ctx = 4096L,
n_gpu_layers = -1L,
model_id = NULL,
system_prompt = NULL,
timeout = 180,
...
)
model_path |
Path to a GGUF model file. Spawns a server (mode A).
Mutually exclusive with |
base_url |
Base URL of a running OpenAI-compatible server, e.g.
|
port |
Port for the spawned server (mode A only). Default
|
n_ctx, n_gpu_layers |
Passed to |
model_id |
Model identifier reported to ellmer. Defaults to the
model file's base name in mode A; |
system_prompt |
Optional system prompt for the chat. |
timeout |
Seconds to wait for a spawned server to accept
connections before erroring (mode A only). Default |
... |
Passed on to |
Two modes, picked by which argument you pass (DBI-style — like
DBI::dbConnect() accepting either connection parameters or a
ready connection):
base_urlConnect to a server you already started (e.g.
llama_serve_openai() in another process, or a worker pool).
No process is spawned.
model_pathSpin up llama_serve_openai() in a
background R process (via callr), wait for it to come up, and
return a Chat pointed at it. The server process's lifetime is
tied to the returned object: when it is garbage-collected (or R
exits), the process is killed. Stop it eagerly with
chat_llamar_stop.
Exactly one of base_url or model_path must be supplied.
An ellmer Chat object. In mode A it additionally
carries the background process handle (see chat_llamar_stop).
The server is single-sequence (one request at a time); see
llama_serve_openai. For parallel sessions, run a pool of
servers on different ports and create one chat_llamar(base_url=)
per worker.
Tool calling and structured output are mediated by the OpenAI protocol,
so they work only as far as the server implements them. The current
server does not emit tool_calls yet (see TODO), so ellmer tools
registered on the returned chat will not be invoked by the model.
[llama_serve_openai], [chat_llamar_stop]
## Not run:
# Mode A: spawn a server for this model and chat with it.
chat <- chat_llamar(model_path = "model.gguf")
chat$chat("Why is the sky blue?")
chat_llamar_stop(chat) # or let GC do it
# Mode B: connect to a server you already run.
llama_serve_openai("model.gguf", port = 11434L) # in another process
chat <- chat_llamar(base_url = "http://127.0.0.1:11434/v1")
chat$chat("Hello!")
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.