tokenizers.bpe: Byte Pair Encoding Text Tokenization

Unsupervised text tokenizer focused on computational efficiency. Wraps the 'YouTokenToMe' library <https://github.com/VKCOM/YouTokenToMe> which is an implementation of fast Byte Pair Encoding (BPE) <https://aclanthology.org/P16-1162/>.

Getting started

Package details

AuthorJan Wijffels [aut, cre, cph] (R wrapper), BNOSAC [cph] (R wrapper), VK.com [cph], Gregory Popovitch [ctb, cph] (Files at src/parallel_hashmap (Apache License, Version 2.0), The Abseil Authors [ctb, cph] (Files at src/parallel_hashmap (Apache License, Version 2.0), Ivan Belonogov [ctb, cph] (Files at src/youtokentome (MIT License))
MaintainerJan Wijffels <jwijffels@bnosac.be>
LicenseMPL-2.0
Version0.1.3
URL https://github.com/bnosac/tokenizers.bpe
Package repositoryView on CRAN
Installation Install the latest version of this package by entering the following in R:
install.packages("tokenizers.bpe")

Try the tokenizers.bpe package in your browser

Any scripts or data that you put into this service are public.

tokenizers.bpe documentation built on Sept. 16, 2023, 1:06 a.m.