pangoling: Access to Large Language Model Predictions

Provides access to word predictability estimates using large language models (LLMs) based on 'transformer' architectures via integration with the 'Hugging Face' ecosystem <https://huggingface.co/>. The package interfaces with pre-trained neural networks and supports both causal/auto-regressive LLMs (e.g., 'GPT-2') and masked/bidirectional LLMs (e.g., 'BERT') to compute the probability of words, phrases, or tokens given their linguistic context. For details on GPT-2 and causal models, see Radford et al. (2019) <https://storage.prod.researchhub.com/uploads/papers/2020/06/01/language-models.pdf>, for details on BERT and masked models, see Devlin et al. (2019) <doi:10.48550/arXiv.1810.04805>. By enabling a straightforward estimation of word predictability, the package facilitates research in psycholinguistics, computational linguistics, and natural language processing (NLP).

Package overview Troubleshooting the use of Python in R Using a Bert model to get the predictability of words in their context Using a GPT2 transformer model to get word predictability Worked-out example: Surprisal from a causal (GPT) model as a cognitive processing bottleneck in reading

Vignettes Man pages API and functions Files

Package details
Author	Bruno Nicenboim [aut, cre] (<https://orcid.org/0000-0002-5176-3943>), Chris Emmerly [ctb], Giovanni Cassani [ctb], Lisa Levinson [rev], Utku Turk [rev]
Maintainer	Bruno Nicenboim <b.nicenboim@tilburguniversity.edu>
License	MIT + file LICENSE
Version	1.0.3
URL	https://docs.ropensci.org/pangoling/ https://github.com/ropensci/pangoling
Package repository	View on CRAN
Installation	Install the latest version of this package by entering the following in R: `install.packages("pangoling")`