load_or_retrieve_vocab | R Documentation |
Usually you will want to use the included vocabulary that can be accessed via
morphemepiece_vocab()
. This function can be used to load (and cache) a
different vocabulary from a file.
load_or_retrieve_vocab(vocab_file)
vocab_file |
path to vocabulary file. File is assumed to be a text file, with one token per line, with the line number (starting at zero) corresponding to the index of that token in the vocabulary. |
The vocab as a character vector of tokens. The casedness of the vocabulary is inferred and attached as the "is_cased" attribute. The vocabulary indices are taken to be the positions of the tokens, starting at zero for historical consistency.
Note that from the perspective of a neural net, the numeric indices are the tokens, and the mapping from token to index is fixed. If we changed the indexing, it would break any pre-trained models using that vocabulary.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.