transformer_encoder_bert: Transformer Stack

transformer_encoder_bertR Documentation

Transformer Stack

Description

Build a BERT-style multi-layer attention-based transformer.

Usage

transformer_encoder_bert(
  embedding_size,
  intermediate_size = 4 * embedding_size,
  n_layer,
  n_head,
  hidden_dropout = 0.1,
  attention_dropout = 0.1
)

Arguments

embedding_size

Integer; the dimension of the embedding vectors.

intermediate_size

Integer; size of dense layers applied after attention mechanism.

n_layer

Integer; the number of attention layers.

n_head

Integer; the number of attention heads per layer.

hidden_dropout

Numeric; the dropout probability to apply to dense layers.

attention_dropout

Numeric; the dropout probability to apply in attention.

Shape

Inputs:

With each input token list of length sequence_length:

  • input: (*, sequence_length, embedding_size)

  • optional mask: (*, sequence_length)

Output:

  • embeddings: list of (*, sequence_length, embedding_size) for each transformer layer.

  • weights: list of (*, n_head, sequence_length, sequence_length) for each transformer layer.

Examples

emb_size <- 4L
seq_len <- 3L
n_head <- 2L
n_layer <- 5L
batch_size <- 2L

model <- transformer_encoder_bert(
  embedding_size = emb_size,
  n_head = n_head,
  n_layer = n_layer
)
# get random values for input
input <- array(
  sample(
    -10:10,
    size = batch_size * seq_len * emb_size,
    replace = TRUE
  ) / 10,
  dim = c(batch_size, seq_len, emb_size)
)
input <- torch::torch_tensor(input)
model(input)

macmillancontentscience/torchtransformers documentation built on Aug. 6, 2023, 5:35 a.m.