tok: Fast Text Tokenization

Interfaces with the 'Hugging Face' tokenizers library to provide implementations of today's most used tokenizers such as the 'Byte-Pair Encoding' algorithm <https://huggingface.co/docs/tokenizers/index>. It's extremely fast for both training new vocabularies and tokenizing texts.

Package details

AuthorDaniel Falbel [aut, cre], Posit [cph]
MaintainerDaniel Falbel <daniel@posit.co>
LicenseMIT + file LICENSE
Version0.1.4
URL https://github.com/mlverse/tok
Package repositoryView on CRAN
Installation Install the latest version of this package by entering the following in R:
install.packages("tok")

Try the tok package in your browser

Any scripts or data that you put into this service are public.

tok documentation built on Sept. 11, 2024, 5:21 p.m.