R/utils.R

Defines functions estimate_tokens

estimate_tokens <- function(text) {
  # Split the text by spaces, punctuation, and new lines
  tokens <- strsplit(text, "\\s|\\.|,|;|:|!|\\?|\\n")

  # Flatten the list of tokens into a vector
  # tokens <- unlist(tokens)

  # Uncomment to remove empty tokens
  # tokens <- tokens[nchar(tokens) > 0]

  # Return the number of tokens in each element
  return(sapply(tokens, length))
}

Try the fuzzylink package in your browser

Any scripts or data that you put into this service are public.

fuzzylink documentation built on Aug. 18, 2025, 5:29 p.m.