tokenize_text | R Documentation |
Given some text and a word piece vocabulary, tokenizes the text. This is primarily a tool for quickly checking the tokenization of a piece of text.
tokenize_text( text, ckpt_dir = NULL, vocab_file = find_vocab(ckpt_dir), include_special = TRUE )
text |
Character vector; text to tokenize. |
ckpt_dir |
Character; path to checkpoint directory. If specified, any
other checkpoint files required by this function ( |
vocab_file |
path to vocabulary file. File is assumed to be a text file, with one token per line, with the line number corresponding to the index of that token in the vocabulary. |
include_special |
Logical; whether to add the special tokens "[CLS]" (at the beginning) and "[SEP]" (at the end) of the token list. |
A list of character vectors, giving the tokenization of the input text.
## Not run: BERT_PRETRAINED_DIR <- download_BERT_checkpoint("bert_base_uncased") tokens <- tokenize_text( text = c("Who doesn't like tacos?", "Not me!"), ckpt_dir = BERT_PRETRAINED_DIR ) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.