transformer | R Documentation |
The Transformer architecture is a nonrecurrent architecture with a series of attention-based blocks. Each block is composed of a multi-head attention layer and a position-wise feedforward layer with an add and normalize layer in between. These layers process input sequences simultaneously, in parallel, independently of sequential order.
layer_embedding_token_position(x, maxlen, vocab_size, embed_dim) layer_transformer_encoder(x, embed_dim, num_heads, ff_dim, num_transformer_blocks)
x |
layer object |
maxlen |
maximum of sequence size |
vocab_size |
vacabulary size |
embed_dim |
embedding size for each token |
num_heads |
number of attention heads |
ff_dim |
hidden layer size in feedforward network inside transformer |
num_transformer_blocks |
number of transformer blocks |
layer object
Dongmin Jung
Lappin, S. (2021). Deep learning and linguistic representation. CRC Press.
Liu, Z., Lin, Y., & Sun, M. (2020). Representation learning for natural language processing. Springer.
num_AA <- 20 length_seq <- 10 embedding_dim <- 16 num_heads <- 2 ff_dim <- 16 num_transformer_blocks <- 2 inputs <- layer_input(shape = length_seq) x <- inputs %>% layer_embedding_token_position(maxlen = length_seq, vocab_size = num_AA, embed_dim = embedding_dim) %>% layer_transformer_encoder(embed_dim = embedding_dim, num_heads = num_heads, ff_dim = ff_dim, num_transformer_blocks = num_transformer_blocks) %>% layer_global_average_pooling_1d()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.