WordpieceTokenizer | R Documentation |
(I'm not sure that this object-based approach is best for R implementation, but for now just trying to reproduce python functionality.)
WordpieceTokenizer(vocab, unk_token = "[UNK]", max_input_chars_per_word = 200)
vocab |
Recognized vocabulary tokens, as a named integer vector. (Name is token, value is index.) |
unk_token |
Token to use for unknown words. |
max_input_chars_per_word |
Length of longest word we will recognize. |
Has method: tokenize.WordpieceTokenizer()
an object of class WordpieceTokenizer
## Not run: vocab <- load_vocab(vocab_file = "vocab.txt") wp_tokenizer <- WordpieceTokenizer(vocab) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.