R/get_tokens.R

Defines functions get_tokens_internal get_tokens

Documented in get_tokens

#' Converts text to tokens
#'
#' @param text a character string to encode to tokens, can be a vector
#' @param model a model to use for tokenization, either a model name, e.g., `gpt-4o`
#' or a tokenizer, e.g., `o200k_base`.
#' See also [available tokenizers](https://github.com/zurawiki/tiktoken-rs/blob/main/tiktoken-rs/src/tokenizer.rs).
#'
#' @return a vector of tokens for the given text as integer
#' @export
#'
#' @seealso [model_to_tokenizer()], [decode_tokens()]
#'
#' @examples
#' get_tokens("Hello World", "gpt-4o")
#' get_tokens("Hello World", "o200k_base")
get_tokens <- function(text, model) {
  if (length(text) > 1) {
    return(lapply(text, function(x) get_tokens_internal(x, model)))
  } else {
    get_tokens_internal(text, model)
  }
}

get_tokens_internal <- function(text, model) {
  res <- tryCatch(
    rs_get_tokens(text, model),
    error = function(e) {
      stop(paste("Could not get tokens from text:", e))
    }
  )
  res
}

Try the rtiktoken package in your browser

Any scripts or data that you put into this service are public.

rtiktoken documentation built on April 15, 2025, 1:35 a.m.