transformer_model | R Documentation |
Multi-headed, multi-layer Transformer from "Attention is All You Need". This is almost an exact implementation of the original Transformer encoder.
transformer_model( input_tensor, attention_mask = NULL, hidden_size = 768L, num_hidden_layers = 12L, num_attention_heads = 12L, intermediate_size = 3072L, intermediate_act_fn = gelu, hidden_dropout_prob = 0.1, attention_probs_dropout_prob = 0.1, initializer_range = 0.02, do_return_all_layers = FALSE )
input_tensor |
Float Tensor of shape |
attention_mask |
(Optional) Integer Tensor of shape |
hidden_size |
Integer; hidden size of the Transformer. |
num_hidden_layers |
Integer; number of layers (blocks) in the Transformer. |
num_attention_heads |
Integer; number of attention heads in the Transformer. |
intermediate_size |
Integer; the size of the "intermediate" (a.k.a., feed forward) layer. |
intermediate_act_fn |
The non-linear activation function to apply to the output of the intermediate/feed-forward layer. (Function, not character.) |
hidden_dropout_prob |
Numeric; the dropout probability for the hidden layers. |
attention_probs_dropout_prob |
Numeric; the dropout probability of the attention probabilities. |
initializer_range |
Numeric; the range of the initializer (stddev of truncated normal). |
do_return_all_layers |
Logical; whether to also return all layers or just the final layer. If this is TRUE, will also return attention probabilities. |
See the original paper: https://arxiv.org/abs/1706.03762
Also see: https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/models/transformer.py
float Tensor of shape [batch_size, seq_length, hidden_size]
,
the final hidden layer of the Transformer. Or if 'do_return_all_layers' is
'TRUE', a list of such Tensors (one for each hidden layer).
## Not run: batch_size <- 10 seq_length <- 500 hidden_size <- 120 with(tensorflow::tf$variable_scope("examples", reuse = tensorflow::tf$AUTO_REUSE ), { input_tensor <- tensorflow::tf$get_variable("input", shape = c( batch_size, seq_length, hidden_size ) ) }) model_t <- transformer_model( input_tensor = input_tensor, hidden_size = hidden_size ) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.