morphemepiece_tokenize: Tokenize Sequence with Morpheme Pieces
In macmillancontentscience/morphemepiece: Morpheme Tokenization

morphemepiece_tokenize

R Documentation

Tokenize Sequence with Morpheme Pieces

Given a single sequence of text and a morphemepiece vocabulary, tokenizes the text.

morphemepiece_tokenize(
  text,
  vocab = morphemepiece_vocab(),
  lookup = morphemepiece_lookup(),
  unk_token = "[UNK]",
  max_chars = 100
)

`text`	Character scalar; text to tokenize.
`vocab`	A morphemepiece vocabulary.
`lookup`	A morphemepiece lookup table.
`unk_token`	Token to represent unknown words.
`max_chars`	Maximum length of word recognized.

A character vector of tokenized text (later, this should be a named integer vector, as in the wordpiece package.)

macmillancontentscience/morphemepiece documentation built on April 19, 2022, 2:20 p.m.

macmillancontentscience/morphemepiece index

Note that we can't provide technical support on individual packages. You should contact the package authors for that.