tokenizers.bpe: Byte Pair Encoding Text Tokenization

Unsupervised text tokenizer focused on computational efficiency. Wraps the 'YouTokenToMe' library <> which is an implementation of fast Byte Pair Encoding (BPE) <>.

Getting started

Package details

AuthorJan Wijffels [aut, cre, cph] (R wrapper), BNOSAC [cph] (R wrapper), [cph], Gregory Popovitch [ctb, cph] (Files at src/parallel_hashmap (Apache License, Version 2.0), The Abseil Authors [ctb, cph] (Files at src/parallel_hashmap (Apache License, Version 2.0), Ivan Belonogov [ctb, cph] (Files at src/youtokentome (MIT License))
MaintainerJan Wijffels <>
Package repositoryView on CRAN
Installation Install the latest version of this package by entering the following in R:

Try the tokenizers.bpe package in your browser

Any scripts or data that you put into this service are public.

tokenizers.bpe documentation built on Aug. 2, 2019, 5:05 p.m.