masked_targets_pred | R Documentation |
Get the predictability (by default the natural logarithm of the word probability) of a vector of target words (or phrase) given a vector of left and of right contexts using a masked transformer.
masked_targets_pred(
prev_contexts,
targets,
after_contexts,
log.p = getOption("pangoling.log.p"),
ignore_regex = "",
model = getOption("pangoling.masked.default"),
checkpoint = NULL,
add_special_tokens = NULL,
config_model = NULL,
config_tokenizer = NULL
)
prev_contexts |
Left context of the target word in left-to-right written languages. |
targets |
Target words. |
after_contexts |
Right context of the target in left-to-right written languages. |
log.p |
Base of the logarithm used for the output predictability values.
If |
ignore_regex |
Can ignore certain characters when calculating the log
probabilities. For example |
model |
Name of a pre-trained model or folder. One should be able to use models based on "bert". See hugging face website. |
checkpoint |
Folder of a checkpoint. |
add_special_tokens |
Whether to include special tokens. It has the same default as the AutoTokenizer method in Python. |
config_model |
List with other arguments that control how the model from Hugging Face is accessed. |
config_tokenizer |
List with other arguments that control how the tokenizer from Hugging Face is accessed. |
A masked language model (also called BERT-like, or encoder model) is a type of large language model that can be used to predict the content of a mask in a sentence.
If not specified, the masked model that will be used is the one set in
specified in the global option pangoling.masked.default
, this can be
accessed via getOption("pangoling.masked.default")
(by default
"bert-base-uncased"). To change the default option
use options(pangoling.masked.default = "newmaskedmodel")
.
A list of possible masked can be found in Hugging Face website
Using the config_model
and config_tokenizer
arguments, it's possible to
control how the model and tokenizer from Hugging Face is accessed, see the
python method
from_pretrained
for details. In case of errors check the status of
https://status.huggingface.co/
A named vector of predictability values (by default the natural logarithm of the word probability).
See the online article in pangoling website for more examples.
Other masked model functions:
masked_tokens_pred_tbl()
masked_targets_pred(
prev_contexts = c("The", "The"),
targets = c("apple", "pear"),
after_contexts = c(
"doesn't fall far from the tree.",
"doesn't fall far from the tree."
),
model = "bert-base-uncased"
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.