macmillancontentscience/wordpiece: R Implementation of Wordpiece Tokenization

Apply 'Wordpiece' (<arXiv:1609.08144>) tokenization to input text, given an appropriate vocabulary. The 'BERT' (<arXiv:1810.04805>) tokenization conventions are used by default.

Getting started

Package details

Maintainer
LicenseApache License (>= 2)
Version2.1.3
URL https://github.com/macmillancontentscience/wordpiece
Package repositoryView on GitHub
Installation Install the latest version of this package by entering the following in R:
install.packages("remotes")
remotes::install_github("macmillancontentscience/wordpiece")
macmillancontentscience/wordpiece documentation built on March 20, 2022, 2:07 a.m.