View source: R/layer-attention.R
layer_additive_attention | R Documentation |
Additive attention layer, a.k.a. Bahdanau-style attention
layer_additive_attention(
object,
use_scale = TRUE,
...,
causal = FALSE,
dropout = 0
)
object |
What to compose the new
|
use_scale |
If |
... |
standard layer arguments. |
causal |
Boolean. Set to |
dropout |
Float between 0 and 1. Fraction of the units to drop for the attention scores. |
Inputs are query
tensor of shape [batch_size, Tq, dim]
, value
tensor of
shape [batch_size, Tv, dim]
and key
tensor of shape
[batch_size, Tv, dim]
. The calculation follows the steps:
Reshape query
and key
into shapes [batch_size, Tq, 1, dim]
and [batch_size, 1, Tv, dim]
respectively.
Calculate scores with shape [batch_size, Tq, Tv]
as a non-linear
sum: scores = tf.reduce_sum(tf.tanh(query + key), axis=-1)
Use scores to calculate a distribution with shape
[batch_size, Tq, Tv]
: distribution = tf$nn$softmax(scores)
.
Use distribution
to create a linear combination of value
with
shape [batch_size, Tq, dim]
:
return tf$matmul(distribution, value)
.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.