nn_ft_transformer_block | R Documentation |
A transformer block consisting of a multi-head self-attention mechanism followed by a feed-forward network.
This is used in LearnerTorchFTTransformer
.
nn_ft_transformer_block(
d_token,
attention_n_heads,
attention_dropout,
attention_initialization,
ffn_d_hidden = NULL,
ffn_d_hidden_multiplier = NULL,
ffn_dropout,
ffn_activation,
residual_dropout,
prenormalization,
is_first_layer,
attention_normalization,
ffn_normalization,
query_idx = NULL,
attention_bias,
ffn_bias_first,
ffn_bias_second
)
d_token |
( |
attention_n_heads |
( |
attention_dropout |
( |
attention_initialization |
( |
( | |
( | |
ffn_dropout |
( |
ffn_activation |
( |
residual_dropout |
( |
prenormalization |
( |
is_first_layer |
( |
attention_normalization |
( |
ffn_normalization |
( |
query_idx |
( |
attention_bias |
( |
ffn_bias_first |
( |
ffn_bias_second |
( |
Devlin, Jacob, Chang, Ming-Wei, Lee, Kenton, Toutanova, Kristina (2018). “Bert: Pre-training of deep bidirectional transformers for language understanding.” arXiv preprint arXiv:1810.04805. Gorishniy Y, Rubachev I, Khrulkov V, Babenko A (2021). “Revisiting Deep Learning for Tabular Data.” arXiv, 2106.11959.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.