tabtransformer: Tabtransformer

Description Usage Arguments Details Value Examples

Description

A torch nn_module using multi-headed self attention (MHSA) for tabular datasets. Additionally, an intersample attention (between rows) layer will be added by setting intersample = FALSE.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
tabtransformer(
  categories,
  num_continuous,
  dim_out = 1,
  final_layer = NULL,
  attention = "both",
  attention_type = "softmax",
  is_first = FALSE,
  dim = 16,
  depth = 4,
  heads_selfattn = 8,
  heads_intersample = 8,
  dim_heads_selfattn = 8,
  dim_heads_intersample = 8,
  attn_dropout = 0.1,
  ff_dropout = 0.8,
  embedding_dropout = 0.1,
  mlp_dropout = 0.1,
  mlp_hidden_mult = c(4, 2),
  softmax_mod = 1,
  is_softmax_mod = 1,
  skip = TRUE,
  device = "cuda"
)

Arguments

categories

(int vector) a vector containing the dimensions of each categorical predictor (in the correct order)

num_continuous

(int) the number of continuous predictors

dim_out

(int) dimensions of the output (default is 1, matching the default binary task)

final_layer

(nn_module) the final layer of the model (e.g. nn_relu() to constrain output to values >= 0 only). Default is NULL, which results a in nn_identity() layer.

attention

(str) string value indicating which type(s) of attention to use, either "both", "mhsa" or "intersample". Default: "both"

attention_type

(str) string value indicating either traditional softmax attention ("softmax"), sparasemax attention ("sparsemax"), signed attention ("signed"), or fast attention ("fast").

is_first

(bool) designates whether intersample attention comes before MHSA

dim

(int) embedding dimension for categorical and continuous data

depth

(int) number of transformer layers

heads_selfattn

(int) number of self-attention heads

heads_intersample

(int) number of intersample attention heads

dim_heads_selfattn

(int) dimensions of the self-attention heads

dim_heads_intersample

(int) dimension of the intersample attention heads

attn_dropout

(float) dropout percentage for attention layers. Default: 0.1.

ff_dropout

(float) dropout percentage for feed-forward layers between attention layers. . Default: 0.1.

embedding_dropout

(float) dropout after the embedding layer. Default: 0.1.

mlp_dropout

(float) dropout between MLP layers. Default: 0.1.

mlp_hidden_mult

(int vector) a numerical vector indicating the hidden dimensions of the final MLP

softmax_mod

(float) multiplier for the MHSA softmax function

is_softmax_mod

(float) multiplier for the intersample attention softmax function

skip

(bool) Whether to include skip connections after attention layers. Default: TRUE.

device

(str) 'cpu' or 'cuda'

Details

Huang et al. introduce MHSA for tabular datasets, Somepalli et al. introduce the concept of intersample attention.

Value

a tabtransformer model

Examples

1
2
3
4
5
6
7
tabtransformer(
  categories = c(4, 2, 13),
  num_continuous = 6,
  final_layer = nn_relu(),
  depth = 1,
  dim = 32
  )

cmcmaster1/torchtabular documentation built on Dec. 19, 2021, 5:17 p.m.