transformer_model: Build multi-head, multi-layer Transformer

View source: R/modeling.R

transformer_modelR Documentation

Build multi-head, multi-layer Transformer

Description

Multi-headed, multi-layer Transformer from "Attention is All You Need". This is almost an exact implementation of the original Transformer encoder.

Usage

transformer_model(
  input_tensor,
  attention_mask = NULL,
  hidden_size = 768L,
  num_hidden_layers = 12L,
  num_attention_heads = 12L,
  intermediate_size = 3072L,
  intermediate_act_fn = gelu,
  hidden_dropout_prob = 0.1,
  attention_probs_dropout_prob = 0.1,
  initializer_range = 0.02,
  do_return_all_layers = FALSE
)

Arguments

input_tensor

Float Tensor of shape [batch_size, seq_length, hidden_size].

attention_mask

(Optional) Integer Tensor of shape batch_size, seq_length, seq_length, with 1 for positions that can be attended to and 0 in positions that should not be.

hidden_size

Integer; hidden size of the Transformer.

num_hidden_layers

Integer; number of layers (blocks) in the Transformer.

num_attention_heads

Integer; number of attention heads in the Transformer.

intermediate_size

Integer; the size of the "intermediate" (a.k.a., feed forward) layer.

intermediate_act_fn

The non-linear activation function to apply to the output of the intermediate/feed-forward layer. (Function, not character.)

hidden_dropout_prob

Numeric; the dropout probability for the hidden layers.

attention_probs_dropout_prob

Numeric; the dropout probability of the attention probabilities.

initializer_range

Numeric; the range of the initializer (stddev of truncated normal).

do_return_all_layers

Logical; whether to also return all layers or just the final layer. If this is TRUE, will also return attention probabilities.

Details

See the original paper: https://arxiv.org/abs/1706.03762

Also see: https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/models/transformer.py

Value

float Tensor of shape [batch_size, seq_length, hidden_size], the final hidden layer of the Transformer. Or if 'do_return_all_layers' is 'TRUE', a list of such Tensors (one for each hidden layer).

Examples

## Not run: 
batch_size <- 10
seq_length <- 500
hidden_size <- 120

with(tensorflow::tf$variable_scope("examples",
  reuse = tensorflow::tf$AUTO_REUSE
), {
  input_tensor <- tensorflow::tf$get_variable("input",
    shape = c(
      batch_size,
      seq_length,
      hidden_size
    )
  )
})

model_t <- transformer_model(
  input_tensor = input_tensor,
  hidden_size = hidden_size
)

## End(Not run)

jonathanbratt/RBERT documentation built on Jan. 26, 2023, 4:15 p.m.