| BERT_vocab | R Documentation |
Check if mask words are in the model vocabulary.
BERT_vocab(
models,
mask.words,
add.tokens = FALSE,
add.verbose = FALSE,
weight.decay = 1
)
models |
A character vector of model names at HuggingFace. |
mask.words |
Option words filling in the mask. |
add.tokens |
Add new tokens (for out-of-vocabulary words or phrases) to model vocabulary? Defaults to
|
add.verbose |
Print subwords of each new token? Defaults to |
weight.decay |
Decay factor of relative importance of multiple subwords. Defaults to
For example, decay = 0.5 would give 0.5 and 0.25 (with normalized weights 0.667 and 0.333) to two subwords (e.g., "individualism" = 0.667 "individual" + 0.333 "##ism"). |
A data.table of model name, mask word, real token (replaced if out of vocabulary), and token id (0~N).
BERT_download()
BERT_info()
FMAT_run()
## Not run:
models = c("bert-base-uncased", "bert-base-cased")
BERT_info(models)
BERT_vocab(models, c("bruce", "Bruce"))
BERT_vocab(models, 2020:2025) # some are out-of-vocabulary
BERT_vocab(models, 2020:2025, add.tokens=TRUE) # add vocab
BERT_vocab(models,
c("individualism", "artificial intelligence"),
add.tokens=TRUE)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.