attention_layer: Build multi-headed attention layer
In jonathanbratt/RBERT: R Implementation of BERT

attention_layer

R Documentation

Build multi-headed attention layer

Description

Performs multi-headed attention from from_tensor to to_tensor. This is an implementation of multi-headed attention based on "Attention is all you Need". If from_tensor and to_tensor are the same, then this is self-attention. Each timestep in from_tensor attends to the corresponding sequence in to_tensor, and returns a fixed-with vector. This function first projects from_tensor into a "query" tensor and to_tensor into "key" and "value" tensors. These are (effectively) a list of tensors of length num_attention_heads, where each tensor is of shape [batch_size, seq_length, size_per_head]. Then, the query and key tensors are dot-producted and scaled. These are softmaxed to obtain attention probabilities. The value tensors are then interpolated by these probabilities, then concatenated back to a single tensor and returned.

Usage

attention_layer(
  from_tensor,
  to_tensor,
  attention_mask = NULL,
  num_attention_heads = 1L,
  size_per_head = 512L,
  query_act = NULL,
  key_act = NULL,
  value_act = NULL,
  attention_probs_dropout_prob = 0,
  initializer_range = 0.02,
  do_return_2d_tensor = FALSE,
  batch_size = NULL,
  from_seq_length = NULL,
  to_seq_length = NULL
)

Arguments

`from_tensor`	Float Tensor of shape `[batch_size, from_seq_length, from_width]`.
`to_tensor`	Float Tensor of shape `[batch_size, to_seq_length, to_width]`.
`attention_mask`	(optional) Integer Tensor of shape `[batch_size, from_seq_length, to_seq_length]`. The values should be 1 or 0. The attention scores will effectively be set to -infinity for any positions in the mask that are 0, and will be unchanged for positions that are 1.
`num_attention_heads`	Integer; number of attention heads.
`size_per_head`	Integer; size of each attention head.
`query_act`	(Optional) Activation function for the query transform.
`key_act`	(Optional) Activation function for the key transform.
`value_act`	(Optional) Activation function for the value transform.
`attention_probs_dropout_prob`	(Optional) Numeric; dropout probability of the attention probabilities.
`initializer_range`	Numeric; range of the weight initializer.
`do_return_2d_tensor`	Logical. If TRUE, the output will be of shape `[batch_size * from_seq_length, num_attention_heads * size_per_head]`. If false, the output will be of shape `[batch_size, from_seq_length, num_attention_heads * size_per_head]`.
`batch_size`	(Optional) Integer; if the input is 2D, this might (sic) be the batch size of the 3D version of the `from_tensor` and `to_tensor`.
`from_seq_length`	(Optional) Integer; if the input is 2D, this might be the seq length of the 3D version of the `from_tensor`.
`to_seq_length`	(Optional) Integer; if the input is 2D, this might be the seq length of the 3D version of the `to_tensor`.

Details

In practice, the multi-headed attention are done with transposes and reshapes rather than actual separate tensors.

Value

float Tensor of shape [batch_size, from_seq_length, num_attention_heads * size_per_head]. If do_return_2d_tensor is TRUE, it will be flattened to shape [batch_size * from_seq_length, num_attention_heads * size_per_head].

Examples

## Not run: 
# Maybe add examples later. For now, this is only called from
# within transformer_model(), so refer to that function.

## End(Not run)

jonathanbratt/RBERT documentation built on Jan. 26, 2023, 4:15 p.m.

jonathanbratt/RBERT index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

jonathanbratt/RBERT
R Implementation of BERT

attention_layer: Build multi-headed attention layer
In jonathanbratt/RBERT: R Implementation of BERT

Build multi-headed attention layer

Description

Usage

Arguments

Details

Value

Examples

Related to attention_layer in jonathanbratt/RBERT...

R Package Documentation

Browse R Packages

We want your feedback!

jonathanbratt/RBERT R Implementation of BERT

attention_layer: Build multi-headed attention layer In jonathanbratt/RBERT: R Implementation of BERT

Build multi-headed attention layer

Description

Usage

Arguments

Details

Value

Examples

Related to attention_layer in jonathanbratt/RBERT...

R Package Documentation

Browse R Packages

We want your feedback!

jonathanbratt/RBERT
R Implementation of BERT

attention_layer: Build multi-headed attention layer
In jonathanbratt/RBERT: R Implementation of BERT