llama_tokenize: Tokenize text into token IDs

View source: R/llama.R

llama_tokenizeR Documentation

Tokenize text into token IDs

Description

Tokenize text into token IDs

Usage

llama_tokenize(ctx, text, add_special = TRUE, parse_special = FALSE)

Arguments

ctx

Context handle returned by [llama_new_context]

text

Character string to tokenize

add_special

Whether to add special tokens (BOS/EOS) as configured by the model

parse_special

Whether to parse control/special tokens (e.g. Mistral's [INST], ChatML's <|im_start|>) as single tokens rather than as their literal characters. Use TRUE for a prompt produced by [llama_chat_apply_template]; the default FALSE treats such markup as plain text.

Value

An integer vector of token IDs as used by the model's vocabulary.

Examples

## Not run: 
model <- llama_load_model("model.gguf")
ctx <- llama_new_context(model)

tokens <- llama_tokenize(ctx, "Hello, world!")
print(tokens)
# [1] 1 15043 29892 3186 29991

# Without special tokens
tokens <- llama_tokenize(ctx, "Hello", add_special = FALSE)

# Parse a templated prompt's role markers as control tokens
prompt <- llama_chat_apply_template(list(list(role = "user", content = "hi")))
tokens <- llama_tokenize(ctx, prompt, parse_special = TRUE)

## End(Not run)

llamaR documentation built on May 28, 2026, 1:06 a.m.