causal_next_tokens_pred_tbl | R Documentation |
This function predicts the possible next tokens and their predictability (log-probabilities by default). The function sorts tokens in descending order of their predictability.
causal_next_tokens_pred_tbl(
context,
log.p = getOption("pangoling.log.p"),
decode = FALSE,
model = getOption("pangoling.causal.default"),
checkpoint = NULL,
add_special_tokens = NULL,
config_model = NULL,
config_tokenizer = NULL
)
context |
A single string representing the context for which the next tokens and their predictabilities are predicted. |
log.p |
Base of the logarithm used for the output predictability values.
If |
decode |
Logical. If |
model |
Name of a pre-trained model or folder. One should be able to use models based on "gpt2". See hugging face website. |
checkpoint |
Folder of a checkpoint. |
add_special_tokens |
Whether to include special tokens. It has the same default as the AutoTokenizer method in Python. |
config_model |
List with other arguments that control how the model from Hugging Face is accessed. |
config_tokenizer |
List with other arguments that control how the tokenizer from Hugging Face is accessed. |
The function uses a causal transformer model to compute the predictability
of all tokens in the model's vocabulary, given a single input context. It
returns a table where each row represents a token, along with its
predictability score. By default, the function returns log-probabilities in
natural logarithm (base e), but you can specify a different logarithm base
(e.g., log.p = 1/2
for surprisal in bits).
If decode = TRUE
, the tokens are converted into human-readable strings,
handling special characters like accents and diacritics. This ensures that
tokens are more interpretable, especially for languages with complex
tokenization.
A table with possible next tokens and their log-probabilities.
A causal language model (also called GPT-like, auto-regressive, or decoder model) is a type of large language model usually used for text-generation that can predict the next word (or more accurately in fact token) based on a preceding context.
If not specified, the causal model used will be the one set in the global
option pangoling.causal.default
, this can be
accessed via getOption("pangoling.causal.default")
(by default
"gpt2"). To change the default option
use options(pangoling.causal.default = "newcausalmodel")
.
A list of possible causal models can be found in Hugging Face website.
Using the config_model
and config_tokenizer
arguments, it's possible to
control how the model and tokenizer from Hugging Face is accessed, see the
Python method
from_pretrained
for details.
In case of errors when a new model is run, check the status of https://status.huggingface.co/
Other causal model functions:
causal_pred_mats()
,
causal_words_pred()
causal_next_tokens_pred_tbl(
context = "The apple doesn't fall far from the",
model = "gpt2"
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.