tokenizers.bpe: Byte Pair Encoding Text Tokenization

Unsupervised text tokenizer focused on computational efficiency. Wraps the 'YouTokenToMe' library <https://github.com/VKCOM/YouTokenToMe> which is an implementation of fast Byte Pair Encoding (BPE) <https://www.aclweb.org/anthology/P16-1162>.

Getting started

Package details

AuthorJan Wijffels [aut, cre, cph] (R wrapper), BNOSAC [cph] (R wrapper), VK.com [cph], Gregory Popovitch [ctb, cph] (Files at src/parallel_hashmap (Apache License, Version 2.0), The Abseil Authors [ctb, cph] (Files at src/parallel_hashmap (Apache License, Version 2.0), Ivan Belonogov [ctb, cph] (Files at src/youtokentome (MIT License))
MaintainerJan Wijffels <[email protected]>
LicenseMPL-2.0
Version0.1.0
URL https://github.com/bnosac/tokenizers.bpe
Package repositoryView on CRAN
Installation Install the latest version of this package by entering the following in R:
install.packages("tokenizers.bpe")

Try the tokenizers.bpe package in your browser

Any scripts or data that you put into this service are public.

tokenizers.bpe documentation built on Aug. 2, 2019, 5:05 p.m.