transformer_encoder_single_bert: Single Transformer Layer

transformer_encoder_single_bertR Documentation

Single Transformer Layer

Description

Build a single layer of a BERT-style attention-based transformer.

Usage

transformer_encoder_single_bert(
  embedding_size,
  intermediate_size = 4 * embedding_size,
  n_head,
  hidden_dropout = 0.1,
  attention_dropout = 0.1
)

Arguments

embedding_size

Integer; the dimension of the embedding vectors.

intermediate_size

Integer; size of dense layers applied after attention mechanism.

n_head

Integer; the number of attention heads per layer.

hidden_dropout

Numeric; the dropout probability to apply to dense layers.

attention_dropout

Numeric; the dropout probability to apply in attention.

Shape

Inputs:

  • input: (*, sequence_length, embedding_size)

  • optional mask: (*, sequence_length)

Output:

  • embeddings: (*, sequence_length, embedding_size)

  • weights: (*, n_head, sequence_length, sequence_length)

Examples

emb_size <- 4L
seq_len <- 3L
n_head <- 2L
batch_size <- 2L

model <- transformer_encoder_single_bert(
  embedding_size = emb_size,
  n_head = n_head
)
# get random values for input
input <- array(
  sample(
    -10:10,
    size = batch_size * seq_len * emb_size,
    replace = TRUE
  ) / 10,
  dim = c(batch_size, seq_len, emb_size)
)
input <- torch::torch_tensor(input)
model(input)

macmillancontentscience/torchtransformers documentation built on Aug. 6, 2023, 5:35 a.m.