rtiktoken: A Byte-Pair-Encoding (BPE) Tokenizer for OpenAI's Large Language Models

A thin wrapper around the tiktoken-rs crate, allowing to encode text into Byte-Pair-Encoding (BPE) tokens and decode tokens back to text. This is useful to understand how Large Language Models (LLMs) perceive text.

Getting started

Package details

AuthorDavid Zimmermann-Kollenda [aut, cre], Roger Zurawicki [aut] (tiktoken-rs Rust library), Authors of the dependent Rust crates [aut] (see AUTHORS file)
MaintainerDavid Zimmermann-Kollenda <david_j_zimmermann@hotmail.com>
LicenseMIT + file LICENSE
Version0.0.7
URL https://davzim.github.io/rtiktoken/ https://github.com/DavZim/rtiktoken/
Package repositoryView on CRAN
Installation Install the latest version of this package by entering the following in R:
install.packages("rtiktoken")

Try the rtiktoken package in your browser

Any scripts or data that you put into this service are public.

rtiktoken documentation built on April 15, 2025, 1:35 a.m.