| transformer | R Documentation |
The Transformer architecture is a nonrecurrent architecture with a series of attention-based blocks. Each block is composed of a multi-head attention layer and a position-wise feedforward layer with an add and normalize layer in between. These layers process input sequences simultaneously, in parallel, independently of sequential order.
layer_embedding_token_position(x, maxlen, vocab_size, embed_dim) layer_transformer_encoder(x, embed_dim, num_heads, ff_dim, num_transformer_blocks)
x |
layer object |
maxlen |
maximum of sequence size |
vocab_size |
vacabulary size |
embed_dim |
embedding size for each token |
num_heads |
number of attention heads |
ff_dim |
hidden layer size in feedforward network inside transformer |
num_transformer_blocks |
number of transformer blocks |
layer object
Dongmin Jung
Lappin, S. (2021). Deep learning and linguistic representation. CRC Press.
Liu, Z., Lin, Y., & Sun, M. (2020). Representation learning for natural language processing. Springer.
num_AA <- 20
length_seq <- 10
embedding_dim <- 16
num_heads <- 2
ff_dim <- 16
num_transformer_blocks <- 2
inputs <- layer_input(shape = length_seq)
x <- inputs %>%
layer_embedding_token_position(maxlen = length_seq,
vocab_size = num_AA,
embed_dim = embedding_dim) %>%
layer_transformer_encoder(embed_dim = embedding_dim,
num_heads = num_heads,
ff_dim = ff_dim,
num_transformer_blocks = num_transformer_blocks) %>%
layer_global_average_pooling_1d()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.