llama_gen_begin: Begin a streaming (token-by-token) generation
In llamaR: Interface for Large Language Models via 'llama.cpp'

llama_gen_begin

R Documentation

Begin a streaming (token-by-token) generation

Description

Sets up sampling and prefills the prompt, returning an opaque state handle that is pulled one chunk at a time with [llama_gen_next]. This is the streaming counterpart to [llama_generate]: same sampler chain and the same output for a given seed, but text arrives incrementally so it can be pushed into an SSE stream as it is produced.

Usage

llama_gen_begin(
  ctx,
  prompt,
  max_new_tokens = 256L,
  temp = 0.8,
  top_k = 50L,
  top_p = 0.9,
  seed = 42L,
  min_p = 0,
  typical_p = 1,
  repeat_penalty = 1,
  repeat_last_n = 64L,
  frequency_penalty = 0,
  presence_penalty = 0,
  mirostat = 0L,
  mirostat_tau = 5,
  mirostat_eta = 0.1,
  grammar = NULL
)

Arguments

`ctx`	Context handle returned by [llama_new_context]
`prompt`	Character string prompt
`max_new_tokens`	Maximum number of tokens to generate
`temp`	Sampling temperature. 0 = greedy decoding.
`top_k`	Top-K filtering (0 = disabled)
`top_p`	Top-P (nucleus) filtering (1.0 = disabled)
`seed`	Random seed for sampling
`min_p`	Min-P filtering threshold (0.0 = disabled)
`typical_p`	Locally typical sampling threshold (1.0 = disabled)
`repeat_penalty`	Repetition penalty (1.0 = disabled)
`repeat_last_n`	Number of last tokens to penalize (0 = disabled, -1 = context size)
`frequency_penalty`	Frequency penalty (0.0 = disabled)
`presence_penalty`	Presence penalty (0.0 = disabled)
`mirostat`	Mirostat sampling mode (0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0)
`mirostat_tau`	Mirostat target entropy (tau parameter)
`mirostat_eta`	Mirostat learning rate (eta parameter)
`grammar`	GBNF grammar string for constrained generation (NULL = disabled)

Details

Typical loop:

st <- llama_gen_begin(ctx, prompt)
repeat {
  chunk <- llama_gen_next(st)
  if (is.null(chunk)) break
  cat(chunk)
}
cat(llama_gen_end(st))  # flush any held-back trailing bytes

Only one streaming generation may be active per context at a time: each call to llama_gen_begin clears the context KV cache.

Value

An external pointer holding the generation state. Pass it to [llama_gen_next] and [llama_gen_end]. The underlying sampler is freed automatically by the garbage collector.

llamaR
Interface for Large Language Models via 'llama.cpp'

llama_gen_begin: Begin a streaming (token-by-token) generation
In llamaR: Interface for Large Language Models via 'llama.cpp'

Begin a streaming (token-by-token) generation

Description

Usage

Arguments

Details

Value

See Also

Related to llama_gen_begin in llamaR...

R Package Documentation

Browse R Packages

We want your feedback!

llamaR Interface for Large Language Models via 'llama.cpp'

llama_gen_begin: Begin a streaming (token-by-token) generation In llamaR: Interface for Large Language Models via 'llama.cpp'

Begin a streaming (token-by-token) generation

Description

Usage

Arguments

Details

Value

See Also

Related to llama_gen_begin in llamaR...

R Package Documentation

Browse R Packages

We want your feedback!

llamaR
Interface for Large Language Models via 'llama.cpp'

llama_gen_begin: Begin a streaming (token-by-token) generation
In llamaR: Interface for Large Language Models via 'llama.cpp'