causal_next_tokens_pred_tbl: Generate next tokens after a context and their predictability...
In pangoling: Access to Large Language Model Predictions

causal_next_tokens_pred_tbl

R Documentation

Generate next tokens after a context and their predictability using a causal transformer model

Description

This function predicts the possible next tokens and their predictability (log-probabilities by default). The function sorts tokens in descending order of their predictability.

Usage

causal_next_tokens_pred_tbl(
  context,
  log.p = getOption("pangoling.log.p"),
  decode = FALSE,
  model = getOption("pangoling.causal.default"),
  checkpoint = NULL,
  add_special_tokens = NULL,
  config_model = NULL,
  config_tokenizer = NULL
)

Arguments

`context`	A single string representing the context for which the next tokens and their predictabilities are predicted.
`log.p`	Base of the logarithm used for the output predictability values. If `TRUE` (default), the natural logarithm (base e) is used. If `FALSE`, the raw probabilities are returned. Alternatively, `log.p` can be set to a numeric value specifying the base of the logarithm (e.g., `2` for base-2 logarithms). To get surprisal in bits (rather than predictability), set `log.p = 1/2`.
`decode`	Logical. If `TRUE`, decodes the tokens into human-readable strings, handling special characters and diacritics. Default is `FALSE`.
`model`	Name of a pre-trained model or folder. One should be able to use models based on "gpt2". See hugging face website.
`checkpoint`	Folder of a checkpoint.
`add_special_tokens`	Whether to include special tokens. It has the same default as the AutoTokenizer method in Python.
`config_model`	List with other arguments that control how the model from Hugging Face is accessed.
`config_tokenizer`	List with other arguments that control how the tokenizer from Hugging Face is accessed.

Details

The function uses a causal transformer model to compute the predictability of all tokens in the model's vocabulary, given a single input context. It returns a table where each row represents a token, along with its predictability score. By default, the function returns log-probabilities in natural logarithm (base e), but you can specify a different logarithm base (e.g., log.p = 1/2 for surprisal in bits).

If decode = TRUE, the tokens are converted into human-readable strings, handling special characters like accents and diacritics. This ensures that tokens are more interpretable, especially for languages with complex tokenization.

Value

A table with possible next tokens and their log-probabilities.

More details about causal models

A causal language model (also called GPT-like, auto-regressive, or decoder model) is a type of large language model usually used for text-generation that can predict the next word (or more accurately in fact token) based on a preceding context.

If not specified, the causal model used will be the one set in the global option pangoling.causal.default, this can be accessed via getOption("pangoling.causal.default") (by default "gpt2"). To change the default option use options(pangoling.causal.default = "newcausalmodel").

A list of possible causal models can be found in Hugging Face website.

Using the config_model and config_tokenizer arguments, it's possible to control how the model and tokenizer from Hugging Face is accessed, see the Python method from_pretrained for details.

In case of errors when a new model is run, check the status of https://status.huggingface.co/

Examples


causal_next_tokens_pred_tbl(
  context = "The apple doesn't fall far from the",
  model = "gpt2"
)

pangoling documentation built on April 11, 2025, 6:16 p.m.

pangoling index

Package overview Troubleshooting the use of Python in R Using a Bert model to get the predictability of words in their context Using a GPT2 transformer model to get word predictability Worked-out example: Surprisal from a causal (GPT) model as a cognitive processing bottleneck in reading

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

pangoling
Access to Large Language Model Predictions

causal_next_tokens_pred_tbl: Generate next tokens after a context and their predictability...
In pangoling: Access to Large Language Model Predictions

Generate next tokens after a context and their predictability using a causal transformer model

Description

Usage

Arguments

Details

Value

More details about causal models

See Also

Examples

Related to causal_next_tokens_pred_tbl in pangoling...

R Package Documentation

Browse R Packages

We want your feedback!

pangoling Access to Large Language Model Predictions

causal_next_tokens_pred_tbl: Generate next tokens after a context and their predictability... In pangoling: Access to Large Language Model Predictions

Generate next tokens after a context and their predictability using a causal transformer model

Description

Usage

Arguments

Details

Value

More details about causal models

See Also

Examples

Related to causal_next_tokens_pred_tbl in pangoling...

R Package Documentation

Browse R Packages

We want your feedback!

pangoling
Access to Large Language Model Predictions

causal_next_tokens_pred_tbl: Generate next tokens after a context and their predictability...
In pangoling: Access to Large Language Model Predictions