layer_graph_attention: GraphAttention
In rdinnager/rspektral: What the Package Does (One Line, Title Case)

\loadmathjax

A graph attention layer (GAT) as presented by Velickovic et al. (2017).

Mode: single, disjoint, mixed, batch.

This layer expects dense inputs when working in batch mode.

This layer computes a convolution similar to layers.GraphConv, but uses the attention mechanism to weight the adjacency matrix instead of using the normalized Laplacian: \mjdeqn Z = \mathbf\alphaXW + b where \mjdeqn \mathbf\alpha _ ij = \frac \exp\left( \mathrmLeakyReLU\left( a^\top (XW)_i \, \| \, (XW)_j \right) \right) \sum\limits_k \in \mathcalN(i) \cup { i } \exp\left( \mathrmLeakyReLU\left( a^\top (XW)_i \, \| \, (XW)_k \right) \right) where \mjeqna \in \mathbbR^2F' is a trainable attention kernel. Dropout is also applied to \mjeqn\alpha before computing \mjeqnZ. Parallel attention heads are computed in parallel and their results are aggregated by concatenation or average.

Input

Node features of shape ([batch], N, F);
Binary adjacency matrix of shape ([batch], N, N);

Output

Node features with the same shape as the input, but with the last dimension changed to channels;
if return_attn_coef=True, a list with the attention coefficients for each attention head. Each attention coefficient matrix has shape ([batch], N, N).

layer_graph_attention(
  object,
  channels,
  attn_heads = 1,
  concat_heads = TRUE,
  dropout_rate = 0.5,
  return_attn_coef = FALSE,
  activation = NULL,
  use_bias = TRUE,
  kernel_initializer = "glorot_uniform",
  bias_initializer = "zeros",
  attn_kernel_initializer = "glorot_uniform",
  kernel_regularizer = NULL,
  bias_regularizer = NULL,
  attn_kernel_regularizer = NULL,
  activity_regularizer = NULL,
  kernel_constraint = NULL,
  bias_constraint = NULL,
  attn_kernel_constraint = NULL,
  ...
)

`channels`	number of output channels
`attn_heads`	number of attention heads to use
`concat_heads`	bool, whether to concatenate the output of the attention heads instead of averaging
`dropout_rate`	internal dropout rate for attention coefficients
`return_attn_coef`	if True, return the attention coefficients for the given input (one N x N matrix for each head).
`activation`	activation function to use
`use_bias`	bool, add a bias vector to the output
`kernel_initializer`	initializer for the weights
`bias_initializer`	initializer for the bias vector
`attn_kernel_initializer`	initializer for the attention weights
`kernel_regularizer`	regularization applied to the weights
`bias_regularizer`	regularization applied to the bias vector
`attn_kernel_regularizer`	regularization applied to the attention kernels
`activity_regularizer`	regularization applied to the output
`kernel_constraint`	constraint applied to the weights
`bias_constraint`	constraint applied to the bias vector.
`attn_kernel_constraint`	constraint applied to the attention kernels