wordpiece: R Implementation of Wordpiece Tokenization

Apply 'Wordpiece' (<arXiv:1609.08144>) tokenization to input text, given an appropriate vocabulary. The 'BERT' (<arXiv:1810.04805>) tokenization conventions are used by default.

Package details

AuthorJonathan Bratt [aut, cre] (<https://orcid.org/0000-0003-2859-0076>), Jon Harmon [aut] (<https://orcid.org/0000-0003-4781-4346>), Bedford Freeman & Worth Pub Grp LLC DBA Macmillan Learning [cph]
MaintainerJonathan Bratt <jonathan.bratt@macmillan.com>
LicenseApache License (>= 2)
Version2.1.3
URL https://github.com/macmillancontentscience/wordpiece
Package repositoryView on CRAN
Installation Install the latest version of this package by entering the following in R:
install.packages("wordpiece")

Try the wordpiece package in your browser

Any scripts or data that you put into this service are public.

wordpiece documentation built on March 18, 2022, 5:55 p.m.