morphemepiece: Morpheme Tokenization

Tokenize text into morphemes. The morphemepiece algorithm uses a lookup table to determine the morpheme breakdown of words, and falls back on a modified wordpiece tokenization algorithm for words not found in the lookup table.

Package details

AuthorJonathan Bratt [aut, cre] (<>), Jon Harmon [aut] (<>), Bedford Freeman & Worth Pub Grp LLC DBA Macmillan Learning [cph]
MaintainerJonathan Bratt <>
LicenseApache License (>= 2)
Package repositoryView on CRAN
Installation Install the latest version of this package by entering the following in R:

Try the morphemepiece package in your browser

Any scripts or data that you put into this service are public.

morphemepiece documentation built on April 16, 2022, 5:05 p.m.