R/get_token_count.R

Defines functions get_token_count

Documented in get_token_count

#' Returns the number of tokens in a text
#'
#' @param text a character string to encode to tokens, can be a vector
#' @param model a model to use for tokenization, either a model name, e.g., `gpt-4o`
#' or a tokenizer, e.g., `o200k_base`.
#' See also [available tokenizers](https://github.com/zurawiki/tiktoken-rs/blob/main/tiktoken-rs/src/tokenizer.rs).
#'
#' @return the number of tokens in the text, vector of integers
#' @export
#'
#' @seealso [model_to_tokenizer()], [get_tokens()]
#' @export
#'
#' @examples
#' get_token_count("Hello World", "gpt-4o")
get_token_count <- function(text, model) {
  if (length(text) > 1) {
    sapply(text, function(x) rs_get_token_count(x, model),
                  USE.NAMES = FALSE)
  } else {
    rs_get_token_count(text, model)
  }
}

Try the rtiktoken package in your browser

Any scripts or data that you put into this service are public.

rtiktoken documentation built on April 15, 2025, 1:35 a.m.