R/tokenizers-package.r

#' Tokenizers
#'
#' A collection of functions with a consistent interface to convert natural
#' language text into tokens.
#'
#' The tokenizers in this package have a consistent interface. They all take
#' either a character vector of any length, or a list where each element is a
#' character vector of length one. The idea is that each element comprises a
#' text. Then each function returns a list with the same length as the input
#' vector, where each element in the list are the tokens generated by the
#' function. If the input character vector or list is named, then the names are
#' preserved.
#'
#' @name tokenizers
#' @docType package
NULL

#' @useDynLib tokenizers, .registration = TRUE
#' @importFrom Rcpp sourceCpp
NULL

Try the tokenizers package in your browser

Any scripts or data that you put into this service are public.

tokenizers documentation built on Dec. 28, 2022, 2:34 a.m.