prepare_vocab | R Documentation |
We use a character vector with class morphemepiece_vocabulary to provide
information about tokens used in
morphemepiece_tokenize
. This function takes a character vector
of tokens and puts it into that format.
prepare_vocab(token_list)
token_list |
A character vector of tokens. |
The vocab as a character vector of tokens. The casedness of the vocabulary is inferred and attached as the "is_cased" attribute. The vocabulary indices are taken to be the positions of the tokens, starting at zero for historical consistency.
Note that from the perspective of a neural net, the numeric indices are the tokens, and the mapping from token to index is fixed. If we changed the indexing, it would break any pre-trained models using that vocabulary.
my_vocab <- prepare_vocab(c("some", "example", "tokens")) class(my_vocab) attr(my_vocab, "is_cased")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.