A graph attention layer (GAT) as presented by Velickovic et al. (2017).
Mode: single, disjoint, mixed, batch.
This layer expects dense inputs when working in batch mode.
This layer computes a convolution similar to layers.GraphConv
, but
uses the attention mechanism to weight the adjacency matrix instead of
using the normalized Laplacian:
\mjdeqn Z = \mathbf\alphaXW + b
where
\mjdeqn \mathbf\alpha _ ij = \frac \exp\left( \mathrmLeakyReLU\left( a^\top (XW)_i \, \| \, (XW)_j \right) \right) \sum\limits_k \in \mathcalN(i) \cup { i } \exp\left( \mathrmLeakyReLU\left( a^\top (XW)_i \, \| \, (XW)_k \right) \right)
where \mjeqna \in \mathbbR^2F' is a trainable attention kernel.
Dropout is also applied to \mjeqn\alpha before computing \mjeqnZ.
Parallel attention heads are computed in parallel and their results are
aggregated by concatenation or average.
Input
Node features of shape ([batch], N, F)
;
Binary adjacency matrix of shape ([batch], N, N)
;
Output
Node features with the same shape as the input, but with the last
dimension changed to channels
;
if return_attn_coef=True
, a list with the attention coefficients for
each attention head. Each attention coefficient matrix has shape
([batch], N, N)
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | layer_graph_attention(
object,
channels,
attn_heads = 1,
concat_heads = TRUE,
dropout_rate = 0.5,
return_attn_coef = FALSE,
activation = NULL,
use_bias = TRUE,
kernel_initializer = "glorot_uniform",
bias_initializer = "zeros",
attn_kernel_initializer = "glorot_uniform",
kernel_regularizer = NULL,
bias_regularizer = NULL,
attn_kernel_regularizer = NULL,
activity_regularizer = NULL,
kernel_constraint = NULL,
bias_constraint = NULL,
attn_kernel_constraint = NULL,
...
)
|
channels |
number of output channels |
attn_heads |
number of attention heads to use |
concat_heads |
bool, whether to concatenate the output of the attention heads instead of averaging |
dropout_rate |
internal dropout rate for attention coefficients |
return_attn_coef |
if True, return the attention coefficients for the given input (one N x N matrix for each head). |
activation |
activation function to use |
use_bias |
bool, add a bias vector to the output |
kernel_initializer |
initializer for the weights |
bias_initializer |
initializer for the bias vector |
attn_kernel_initializer |
initializer for the attention weights |
kernel_regularizer |
regularization applied to the weights |
bias_regularizer |
regularization applied to the bias vector |
attn_kernel_regularizer |
regularization applied to the attention kernels |
activity_regularizer |
regularization applied to the output |
kernel_constraint |
constraint applied to the weights |
bias_constraint |
constraint applied to the bias vector. |
attn_kernel_constraint |
constraint applied to the attention kernels |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.