macmillancontentscience/morphemepiece: Morpheme Tokenization

Tokenize text into morphemes. The morphemepiece algorithm uses a lookup table to determine the morpheme breakdown of words, and falls back on a modified wordpiece tokenization algorithm for words not found in the lookup table.

Getting started

Package details

Maintainer
LicenseApache License (>= 2)
Version1.2.3
URL https://github.com/macmillancontentscience/morphemepiece
Package repositoryView on GitHub
Installation Install the latest version of this package by entering the following in R:
install.packages("remotes")
remotes::install_github("macmillancontentscience/morphemepiece")
macmillancontentscience/morphemepiece documentation built on April 19, 2022, 2:20 p.m.