| transformer_encoder_bert | R Documentation |
Build a BERT-style multi-layer attention-based transformer.
transformer_encoder_bert(
embedding_size,
intermediate_size = 4 * embedding_size,
n_layer,
n_head,
hidden_dropout = 0.1,
attention_dropout = 0.1
)
embedding_size |
Integer; the dimension of the embedding vectors. |
intermediate_size |
Integer; size of dense layers applied after attention mechanism. |
n_layer |
Integer; the number of attention layers. |
n_head |
Integer; the number of attention heads per layer. |
hidden_dropout |
Numeric; the dropout probability to apply to dense layers. |
attention_dropout |
Numeric; the dropout probability to apply in attention. |
Inputs:
With each input token list of length sequence_length:
input: (*, sequence_length, embedding_size)
optional mask: (*, sequence_length)
Output:
embeddings: list of (*, sequence_length, embedding_size) for each
transformer layer.
weights: list of (*, n_head, sequence_length, sequence_length) for
each transformer layer.
emb_size <- 4L
seq_len <- 3L
n_head <- 2L
n_layer <- 5L
batch_size <- 2L
model <- transformer_encoder_bert(
embedding_size = emb_size,
n_head = n_head,
n_layer = n_layer
)
# get random values for input
input <- array(
sample(
-10:10,
size = batch_size * seq_len * emb_size,
replace = TRUE
) / 10,
dim = c(batch_size, seq_len, emb_size)
)
input <- torch::torch_tensor(input)
model(input)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.