attention_bert: BERT-Style Attention

attention_bertR Documentation

BERT-Style Attention

Description

Takes in an input tensor (e.g. sequence of token embeddings), applies an attention layer and layer-norms the result. Returns both the attention weights and the output embeddings.

Usage

attention_bert(embedding_size, n_head, attention_dropout = 0.1)

Arguments

embedding_size

Integer; the dimension of the embedding vectors.

n_head

Integer; the number of attention heads per layer.

attention_dropout

Numeric; the dropout probability to apply in attention.

Shape

Inputs:

  • input: (*, sequence_length, embedding_size)

  • optional mask: (*, sequence_length)

Output:

  • embeddings: (*, sequence_length, embedding_size)

  • weights: (*, n_head, sequence_length, sequence_length)

Examples

emb_size <- 4L
seq_len <- 3L
n_head <- 2L
batch_size <- 2L

model <- attention_bert(
  embedding_size = emb_size,
  n_head = n_head
)
# get random values for input
input <- array(
  sample(
    -10:10,
    size = batch_size * seq_len * emb_size,
    replace = TRUE
  ) / 10,
  dim = c(batch_size, seq_len, emb_size)
)
input <- torch::torch_tensor(input)
model(input)

macmillancontentscience/torchtransformers documentation built on Aug. 6, 2023, 5:35 a.m.