| llama_gen_begin | R Documentation |
Sets up sampling and prefills the prompt, returning an opaque state handle that is pulled one chunk at a time with [llama_gen_next]. This is the streaming counterpart to [llama_generate]: same sampler chain and the same output for a given seed, but text arrives incrementally so it can be pushed into an SSE stream as it is produced.
llama_gen_begin(
ctx,
prompt,
max_new_tokens = 256L,
temp = 0.8,
top_k = 50L,
top_p = 0.9,
seed = 42L,
min_p = 0,
typical_p = 1,
repeat_penalty = 1,
repeat_last_n = 64L,
frequency_penalty = 0,
presence_penalty = 0,
mirostat = 0L,
mirostat_tau = 5,
mirostat_eta = 0.1,
grammar = NULL
)
ctx |
Context handle returned by [llama_new_context] |
prompt |
Character string prompt |
max_new_tokens |
Maximum number of tokens to generate |
temp |
Sampling temperature. 0 = greedy decoding. |
top_k |
Top-K filtering (0 = disabled) |
top_p |
Top-P (nucleus) filtering (1.0 = disabled) |
seed |
Random seed for sampling |
min_p |
Min-P filtering threshold (0.0 = disabled) |
typical_p |
Locally typical sampling threshold (1.0 = disabled) |
repeat_penalty |
Repetition penalty (1.0 = disabled) |
repeat_last_n |
Number of last tokens to penalize (0 = disabled, -1 = context size) |
frequency_penalty |
Frequency penalty (0.0 = disabled) |
presence_penalty |
Presence penalty (0.0 = disabled) |
mirostat |
Mirostat sampling mode (0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0) |
mirostat_tau |
Mirostat target entropy (tau parameter) |
mirostat_eta |
Mirostat learning rate (eta parameter) |
grammar |
GBNF grammar string for constrained generation (NULL = disabled) |
Typical loop:
st <- llama_gen_begin(ctx, prompt)
repeat {
chunk <- llama_gen_next(st)
if (is.null(chunk)) break
cat(chunk)
}
cat(llama_gen_end(st)) # flush any held-back trailing bytes
Only one streaming generation may be active per context at a time: each
call to llama_gen_begin clears the context KV cache.
An external pointer holding the generation state. Pass it to [llama_gen_next] and [llama_gen_end]. The underlying sampler is freed automatically by the garbage collector.
[llama_gen_next], [llama_gen_end], [llama_generate]
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.