| attention_bahdanau_monotonic | R Documentation | 
Monotonic attention mechanism with Bahadanau-style energy function.
attention_bahdanau_monotonic( object, units, memory = NULL, memory_sequence_length = NULL, normalize = FALSE, sigmoid_noise = 0, sigmoid_noise_seed = NULL, score_bias_init = 0, mode = "parallel", kernel_initializer = "glorot_uniform", dtype = NULL, name = "BahdanauMonotonicAttention", ... )
object | 
 Model or layer object  | 
units | 
 The depth of the query mechanism.  | 
memory | 
 The memory to query; usually the output of an RNN encoder. This tensor should be shaped [batch_size, max_time, ...].  | 
memory_sequence_length | 
 (optional): Sequence lengths for the batch entries in memory. If provided, the memory tensor rows are masked with zeros for values past the respective sequence lengths.  | 
normalize | 
 Python boolean. Whether to normalize the energy term.  | 
sigmoid_noise | 
 Standard deviation of pre-sigmoid noise. See the docstring for '_monotonic_probability_fn' for more information.  | 
sigmoid_noise_seed | 
 (optional) Random seed for pre-sigmoid noise.  | 
score_bias_init | 
 Initial value for score bias scalar. It's recommended to initialize this to a negative value when the length of the memory is large.  | 
mode | 
 How to compute the attention distribution. Must be one of 'recursive', 'parallel', or 'hard'. See the docstring for tfa.seq2seq.monotonic_attention for more information.  | 
kernel_initializer | 
 (optional), the name of the initializer for the attention kernel.  | 
dtype | 
 The data type for the query and memory layers of the attention mechanism.  | 
name | 
 Name to use when creating ops.  | 
... | 
 A list that contains other common arguments for layer creation.  | 
This type of attention enforces a monotonic constraint on the attention distributions; that is once the model attends to a given point in the memory it can't attend to any prior points at subsequence output timesteps. It achieves this by using the _monotonic_probability_fn instead of softmax to construct its attention distributions. Since the attention scores are passed through a sigmoid, a learnable scalar bias parameter is applied after the score function and before the sigmoid. Otherwise, it is equivalent to BahdanauAttention. This approach is proposed in
Colin Raffel, Minh-Thang Luong, Peter J. Liu, Ron J. Weiss, Douglas Eck, "Online and Linear-Time Attention by Enforcing Monotonic Alignments." ICML 2017. https://arxiv.org/abs/1704.00784
None
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.