dot-mp_tokenize_word_lookup: Tokenize a Word Including Lookup

.mp_tokenize_word_lookupR Documentation

Tokenize a Word Including Lookup

Description

Look up a word in the table; go to fall-back otherwise.

Usage

.mp_tokenize_word_lookup(word, vocab, lookup, unk_token, max_chars)

Arguments

word

Character scalar; word to tokenize.

vocab

A morphemepiece vocabulary.

lookup

A morphemepiece lookup table.

unk_token

Token to represent unknown words.

max_chars

Maximum length of word recognized.

Value

Input word, broken into tokens.


macmillancontentscience/morphemepiece documentation built on April 19, 2022, 2:20 p.m.