textEmbedRawLayers: Extract layers of hidden states
In text: Analyses of Text using Transformers Models from HuggingFace, Natural Language Processing and Machine Learning

textEmbedRawLayers

R Documentation

Extract layers of hidden states

Description

textEmbedRawLayers extracts layers of hidden states (word embeddings) for all character variables in a given dataframe.

Usage

textEmbedRawLayers(
  texts,
  model = "bert-base-uncased",
  layers = -2,
  return_tokens = TRUE,
  word_type_embeddings = FALSE,
  decontextualize = FALSE,
  keep_token_embeddings = TRUE,
  device = "cpu",
  tokenizer_parallelism = FALSE,
  model_max_length = NULL,
  max_token_to_sentence = 4,
  hg_gated = FALSE,
  hg_token = Sys.getenv("HUGGINGFACE_TOKEN", unset = ""),
  trust_remote_code = FALSE,
  logging_level = "error",
  sort = TRUE
)

Arguments

`texts`	A character variable or a tibble with at least one character variable.
`model`	(character) Character string specifying pre-trained language model (default = 'bert-base-uncased'). For full list of options see pretrained models at HuggingFace. For example use "bert-base-multilingual-cased", "openai-gpt", "gpt2", "ctrl", "transfo-xl-wt103", "xlnet-base-cased", "xlm-mlm-enfr-1024", "distilbert-base-cased", "roberta-base", or "xlm-roberta-base". Only load models that you trust from HuggingFace; loading a malicious model can execute arbitrary code on your computer).
`layers`	(character or numeric) Specify the layers that should be extracted (default -2, which give the second to last layer). It is more efficient to only extract the layers that you need (e.g., 11). You can also extract several (e.g., 11:12), or all by setting this parameter to "all". Layer 0 is the decontextualized input layer (i.e., not comprising hidden states) and thus should normally not be used. These layers can then be aggregated in the textEmbedLayerAggregation function.
`return_tokens`	(boolean) If TRUE, provide the tokens used in the specified transformer model. (default = TRUE)
`word_type_embeddings`	(boolean) Wether to provide embeddings for each word/token type. (default = FALSE)
`decontextualize`	(boolean) Wether to dectonextualise embeddings (i.e., embedding one word at a time). (default = TRUE)
`keep_token_embeddings`	(boolean) Whether to keep token level embeddings in the output (when using word_types aggregation). (default= TRUE)
`device`	(character) Name of device to use: 'cpu', 'gpu', 'gpu:k' or 'mps'/'mps:k' for MacOS, where k is a specific device number. (default = "cpu")
`tokenizer_parallelism`	(boolean) If TRUE this will turn on tokenizer parallelism. (default = FALSE).
`model_max_length`	The maximum length (in number of tokens) for the inputs to the transformer model (default the value stored for the associated model).
`max_token_to_sentence`	(numeric) Maximum number of tokens in a string to handle before switching to embedding text sentence by sentence. (default= 4)
`hg_gated`	Set to TRUE if the accessed model is gated.
`hg_token`	The token needed to access the gated model. Create a token from the ['Settings' page](https://huggingface.co/settings/tokens) of the Hugging Face website. An an environment variable HUGGINGFACE_TOKEN can be set to avoid the need to enter the token each time.
`trust_remote_code`	use a model with custom code on the Huggingface Hub
`logging_level`	(character) Set the logging level. (default ="error") Options (ordered from less logging to more logging): critical, error, warning, info, debug
`sort`	(boolean) If TRUE sort the output to tidy format. (default = TRUE)

Value

The textEmbedRawLayers() takes text as input, and returns the hidden states for each token of the text, including the [CLS] and the [SEP]. Note that layer 0 is the input embedding to the transformer, and should normally not be used.

Examples

# Get hidden states of layer 11 and 12 for "I am fine".
## Not run: 
imf_embeddings_11_12 <- textEmbedRawLayers(
  "I am fine",
  layers = 11:12
)

# Show hidden states of layer 11 and 12.
imf_embeddings_11_12

## End(Not run)

text documentation built on June 8, 2025, 1:32 p.m.

text index

README.md Creating a Singularity Container to Run HuggingFace Transformers Models in R Extended Installation Guide Getting started How to best manage computationally heavy analyses HuggingFace language models are downloaded in .cache HuggingFace Transformers in R: Word Embeddings Defaults and Specifications Implicit Motives Tutorial L-BAM Tutorial Pre-registration and Researcher Degrees of Freedom Psychological Methods: the Text Tutorial The Language-Based Assessment Model (L-BAM) Library

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

text
Analyses of Text using Transformers Models from HuggingFace, Natural Language Processing and Machine Learning

textEmbedRawLayers: Extract layers of hidden states
In text: Analyses of Text using Transformers Models from HuggingFace, Natural Language Processing and Machine Learning

Extract layers of hidden states

Description

Usage

Arguments

Value

See Also

Examples

Related to textEmbedRawLayers in text...

R Package Documentation

Browse R Packages

We want your feedback!

text Analyses of Text using Transformers Models from HuggingFace, Natural Language Processing and Machine Learning

textEmbedRawLayers: Extract layers of hidden states In text: Analyses of Text using Transformers Models from HuggingFace, Natural Language Processing and Machine Learning

Extract layers of hidden states

Description

Usage

Arguments

Value

See Also

Examples

Related to textEmbedRawLayers in text...

R Package Documentation

Browse R Packages

We want your feedback!

text
Analyses of Text using Transformers Models from HuggingFace, Natural Language Processing and Machine Learning

textEmbedRawLayers: Extract layers of hidden states
In text: Analyses of Text using Transformers Models from HuggingFace, Natural Language Processing and Machine Learning