llama_tokenize: Tokenize text into token IDs
In llamaR: Interface for Large Language Models via 'llama.cpp'

llama_tokenize

R Documentation

Tokenize text into token IDs

Description

Tokenize text into token IDs

Usage

llama_tokenize(ctx, text, add_special = TRUE, parse_special = FALSE)

Arguments

`ctx`	Context handle returned by [llama_new_context]
`text`	Character string to tokenize
`add_special`	Whether to add special tokens (BOS/EOS) as configured by the model
`parse_special`	Whether to parse control/special tokens (e.g. Mistral's `[INST]`, ChatML's `<\|im_start\|>`) as single tokens rather than as their literal characters. Use `TRUE` for a prompt produced by [llama_chat_apply_template]; the default `FALSE` treats such markup as plain text.

Value

An integer vector of token IDs as used by the model's vocabulary.

Examples

## Not run: 
model <- llama_load_model("model.gguf")
ctx <- llama_new_context(model)

tokens <- llama_tokenize(ctx, "Hello, world!")
print(tokens)
# [1] 1 15043 29892 3186 29991

# Without special tokens
tokens <- llama_tokenize(ctx, "Hello", add_special = FALSE)

# Parse a templated prompt's role markers as control tokens
prompt <- llama_chat_apply_template(list(list(role = "user", content = "hi")))
tokens <- llama_tokenize(ctx, prompt, parse_special = TRUE)

## End(Not run)

llamaR documentation built on May 28, 2026, 1:06 a.m.