textEmbedLayerAggregation: Select and aggregate layers of hidden states to form a word...

View source: R/1_1_textEmbed.R

textEmbedLayerAggregationR Documentation

Select and aggregate layers of hidden states to form a word embeddings.

Description

Select and aggregate layers of hidden states to form a word embeddings.

Usage

textEmbedLayerAggregation(
  word_embeddings_layers,
  layers = "all",
  aggregation_from_layers_to_tokens = "concatenate",
  aggregation_from_tokens_to_texts = "mean",
  return_tokens = FALSE,
  tokens_select = NULL,
  tokens_deselect = NULL
)

Arguments

word_embeddings_layers

Layers outputted from textEmbedRawLayers.

layers

The numbers of the layers to be aggregated (e.g., c(11:12) to aggregate the eleventh and twelfth). Note that layer 0 is the input embedding to the transformer, and should normally not be used. Selecting 'all' thus removes layer 0.

aggregation_from_layers_to_tokens

Method to carry out the aggregation among the layers for each word/token, including "min", "max" and "mean" which takes the minimum, maximum or mean across each column; or "concatenate", which links together each layer of the word embedding to one long row. Default is "concatenate"

aggregation_from_tokens_to_texts

Method to carry out the aggregation among the word embeddings for the words/tokens, including "min", "max" and "mean" which takes the minimum, maximum or mean across each column; or "concatenate", which links together each layer of the word embedding to one long row.

return_tokens

If TRUE, provide the tokens used in the specified transformer model.

tokens_select

Option to only select embeddings linked to specific tokens such as "[CLS]" and "[SEP]" (default NULL).

tokens_deselect

Option to deselect embeddings linked to specific tokens such as "[CLS]" and "[SEP]" (default NULL).

Value

A tibble with word embeddings. Note that layer 0 is the input embedding to the transformer, which is normally not used.

See Also

see textEmbedRawLayers and textEmbed

Examples


# word_embeddings_layers <- textEmbedRawLayers(Language_based_assessment_data_8$harmonywords[1],
# layers = 11:12)
# word_embeddings <- textEmbedLayerAggregation(word_embeddings_layers$context, layers = 11)


text documentation built on Aug. 9, 2023, 5:08 p.m.