dot-mp_tokenize_single_string: Tokenize an Input Word-by-word

.mp_tokenize_single_stringR Documentation

Tokenize an Input Word-by-word

Description

Tokenize an Input Word-by-word

Usage

.mp_tokenize_single_string(words, vocab, lookup, unk_token, max_chars)

Arguments

words

Character; a vector of words (generated by space-tokenizing a single input).

vocab

A morphemepiece vocabulary.

lookup

A morphemepiece lookup table.

unk_token

Token to represent unknown words.

max_chars

Maximum length of word recognized.

Value

A named integer vector of tokenized words.


morphemepiece documentation built on April 16, 2022, 5:05 p.m.