| causal_next_tokens_pred_tbl | R Documentation |
This function predicts the possible next tokens and their predictability (log-probabilities by default). The function sorts tokens in descending order of their predictability.
causal_next_tokens_pred_tbl(
context,
log.p = getOption("pangoling.log.p"),
decode = FALSE,
model = getOption("pangoling.causal.default"),
checkpoint = NULL,
add_special_tokens = NULL,
config_model = NULL,
config_tokenizer = NULL
)
context |
A single string representing the context for which the next tokens and their predictabilities are predicted. |
log.p |
Base of the logarithm used for the output predictability values.
If |
decode |
Logical. If |
model |
Name of a pre-trained model or folder. One should be able to use models based on "gpt2". See hugging face website. |
checkpoint |
Folder of a checkpoint. |
add_special_tokens |
Whether to include special tokens. It has the same default as the AutoTokenizer method in Python. |
config_model |
List with other arguments that control how the model from Hugging Face is accessed. |
config_tokenizer |
List with other arguments that control how the tokenizer from Hugging Face is accessed. |
The function uses a causal transformer model to compute the predictability
of all tokens in the model's vocabulary, given a single input context. It
returns a table where each row represents a token, along with its
predictability score. By default, the function returns log-probabilities in
natural logarithm (base e), but you can specify a different logarithm base
(e.g., log.p = 1/2 for surprisal in bits).
If decode = TRUE, the tokens are converted into human-readable strings,
handling special characters like accents and diacritics. This ensures that
tokens are more interpretable, especially for languages with complex
tokenization.
A table with possible next tokens and their log-probabilities.
A causal language model (also called GPT-like, auto-regressive, or decoder model) is a type of large language model usually used for text-generation that can predict the next word (or more accurately in fact token) based on a preceding context.
If not specified, the causal model used will be the one set in the global
option pangoling.causal.default, this can be
accessed via getOption("pangoling.causal.default") (by default
"gpt2"). To change the default option
use options(pangoling.causal.default = "newcausalmodel").
A list of possible causal models can be found in Hugging Face website.
Using the config_model and config_tokenizer arguments, it's possible to
control how the model and tokenizer from Hugging Face is accessed, see the
Python method
from_pretrained
for details.
In case of errors when a new model is run, check the status of https://status.huggingface.co/
Other causal model functions:
causal_pred_mats(),
causal_words_pred()
causal_next_tokens_pred_tbl(
context = "The apple doesn't fall far from the",
model = "gpt2"
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.