transformer_model: Build multi-head, multi-layer Transformer
In jonathanbratt/RBERT: R Implementation of BERT

transformer_model

R Documentation

Build multi-head, multi-layer Transformer

Description

Multi-headed, multi-layer Transformer from "Attention is All You Need". This is almost an exact implementation of the original Transformer encoder.

Usage

transformer_model(
  input_tensor,
  attention_mask = NULL,
  hidden_size = 768L,
  num_hidden_layers = 12L,
  num_attention_heads = 12L,
  intermediate_size = 3072L,
  intermediate_act_fn = gelu,
  hidden_dropout_prob = 0.1,
  attention_probs_dropout_prob = 0.1,
  initializer_range = 0.02,
  do_return_all_layers = FALSE
)

Arguments

`input_tensor`	Float Tensor of shape `[batch_size, seq_length, hidden_size]`.
`attention_mask`	(Optional) Integer Tensor of shape `batch_size, seq_length, seq_length`, with 1 for positions that can be attended to and 0 in positions that should not be.
`hidden_size`	Integer; hidden size of the Transformer.
`num_hidden_layers`	Integer; number of layers (blocks) in the Transformer.
`num_attention_heads`	Integer; number of attention heads in the Transformer.
`intermediate_size`	Integer; the size of the "intermediate" (a.k.a., feed forward) layer.
`intermediate_act_fn`	The non-linear activation function to apply to the output of the intermediate/feed-forward layer. (Function, not character.)
`hidden_dropout_prob`	Numeric; the dropout probability for the hidden layers.
`attention_probs_dropout_prob`	Numeric; the dropout probability of the attention probabilities.
`initializer_range`	Numeric; the range of the initializer (stddev of truncated normal).
`do_return_all_layers`	Logical; whether to also return all layers or just the final layer. If this is TRUE, will also return attention probabilities.

Details

See the original paper: https://arxiv.org/abs/1706.03762

Also see: https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/models/transformer.py

Value

float Tensor of shape [batch_size, seq_length, hidden_size], the final hidden layer of the Transformer. Or if 'do_return_all_layers' is 'TRUE', a list of such Tensors (one for each hidden layer).

Examples

## Not run: 
batch_size <- 10
seq_length <- 500
hidden_size <- 120

with(tensorflow::tf$variable_scope("examples",
  reuse = tensorflow::tf$AUTO_REUSE
), {
  input_tensor <- tensorflow::tf$get_variable("input",
    shape = c(
      batch_size,
      seq_length,
      hidden_size
    )
  )
})

model_t <- transformer_model(
  input_tensor = input_tensor,
  hidden_size = hidden_size
)

## End(Not run)

jonathanbratt/RBERT documentation built on Jan. 26, 2023, 4:15 p.m.