tok-package: tok: Fast Text Tokenization

tok-packageR Documentation

tok: Fast Text Tokenization

Description

Interfaces with the 'Hugging Face' tokenizers library to provide implementations of today's most used tokenizers such as the 'Byte-Pair Encoding' algorithm https://huggingface.co/docs/tokenizers/index. It's extremely fast for both training new vocabularies and tokenizing texts.

Author(s)

Maintainer: Daniel Falbel daniel@posit.co

Other contributors:

  • Posit [copyright holder]

See Also

Useful links:


tok documentation built on Sept. 11, 2024, 5:21 p.m.