R/tokenizers-package.r

#' Tokenizers
#'
#' A collection of functions with a consistent interface to convert natural
#' language text into tokens.
#'
#' The tokenizers in this package have a consistent interface. They all take
#' either a character vector of any length, or a list where each element is a
#' character vector of length one. The idea is that each element comprises a
#' text. Then each function returns a list with the same length as the input
#' vector, where each element in the list are the tokens generated by the
#' function. If the input character vector or list is named, then the names are
#' preserved.
#'
#' @name tokenizers
#' @docType package
NULL

#' @useDynLib tokenizers, .registration = TRUE
#' @importFrom Rcpp sourceCpp
NULL
ropensci/tokenizers documentation built on March 29, 2024, 1:21 p.m.