Description Usage Arguments Details Value Examples
A torch nn_module
using multi-headed self attention (MHSA) for tabular datasets.
Additionally, an intersample attention (between rows) layer will be added by setting intersample = FALSE
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | tabtransformer(
categories,
num_continuous,
dim_out = 1,
final_layer = NULL,
attention = "both",
attention_type = "softmax",
is_first = FALSE,
dim = 16,
depth = 4,
heads_selfattn = 8,
heads_intersample = 8,
dim_heads_selfattn = 8,
dim_heads_intersample = 8,
attn_dropout = 0.1,
ff_dropout = 0.8,
embedding_dropout = 0.1,
mlp_dropout = 0.1,
mlp_hidden_mult = c(4, 2),
softmax_mod = 1,
is_softmax_mod = 1,
skip = TRUE,
device = "cuda"
)
|
categories |
(int vector) a vector containing the dimensions of each categorical predictor (in the correct order) |
num_continuous |
(int) the number of continuous predictors |
dim_out |
(int) dimensions of the output (default is 1, matching the default binary task) |
final_layer |
(nn_module) the final layer of the model (e.g. |
attention |
(str) string value indicating which type(s) of attention to use, either "both", "mhsa" or "intersample". Default: "both" |
attention_type |
(str) string value indicating either traditional softmax attention ("softmax"), sparasemax attention ("sparsemax"), signed attention ("signed"), or fast attention ("fast"). |
is_first |
(bool) designates whether intersample attention comes before MHSA |
dim |
(int) embedding dimension for categorical and continuous data |
depth |
(int) number of transformer layers |
heads_selfattn |
(int) number of self-attention heads |
heads_intersample |
(int) number of intersample attention heads |
dim_heads_selfattn |
(int) dimensions of the self-attention heads |
dim_heads_intersample |
(int) dimension of the intersample attention heads |
attn_dropout |
(float) dropout percentage for attention layers. Default: 0.1. |
ff_dropout |
(float) dropout percentage for feed-forward layers between attention layers. . Default: 0.1. |
embedding_dropout |
(float) dropout after the embedding layer. Default: 0.1. |
mlp_dropout |
(float) dropout between MLP layers. Default: 0.1. |
mlp_hidden_mult |
(int vector) a numerical vector indicating the hidden dimensions of the final MLP |
softmax_mod |
(float) multiplier for the MHSA softmax function |
is_softmax_mod |
(float) multiplier for the intersample attention softmax function |
skip |
(bool) Whether to include skip connections after attention layers. Default: TRUE. |
device |
(str) 'cpu' or 'cuda' |
Huang et al. introduce MHSA for tabular datasets, Somepalli et al. introduce the concept of intersample attention.
a tabtransformer model
1 2 3 4 5 6 7 | tabtransformer(
categories = c(4, 2, 13),
num_continuous = 6,
final_layer = nn_relu(),
depth = 1,
dim = 32
)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.