attention_bert | R Documentation |
Takes in an input tensor (e.g. sequence of token embeddings), applies an attention layer and layer-norms the result. Returns both the attention weights and the output embeddings.
attention_bert(embedding_size, n_head, attention_dropout = 0.1)
embedding_size |
Integer; the dimension of the embedding vectors. |
n_head |
Integer; the number of attention heads per layer. |
attention_dropout |
Numeric; the dropout probability to apply in attention. |
Inputs:
input: (*, sequence_length, embedding_size)
optional mask: (*, sequence_length)
Output:
embeddings: (*, sequence_length, embedding_size)
weights: (*, n_head, sequence_length, sequence_length)
emb_size <- 4L
seq_len <- 3L
n_head <- 2L
batch_size <- 2L
model <- attention_bert(
embedding_size = emb_size,
n_head = n_head
)
# get random values for input
input <- array(
sample(
-10:10,
size = batch_size * seq_len * emb_size,
replace = TRUE
) / 10,
dim = c(batch_size, seq_len, emb_size)
)
input <- torch::torch_tensor(input)
model(input)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.