View source: R/create_model_transformer.R
create_model_transformer | R Documentation |
Creates transformer network for classification. Model can consist of several stacked attention blocks.
create_model_transformer(
maxlen,
vocabulary_size = 4,
embed_dim = 64,
pos_encoding = "embedding",
head_size = 4L,
num_heads = 5L,
ff_dim = 8,
dropout = 0,
n = 10000,
layer_dense = 2,
dropout_dense = NULL,
flatten_method = "flatten",
last_layer_activation = "softmax",
loss_fn = "categorical_crossentropy",
solver = "adam",
learning_rate = 0.01,
label_noise_matrix = NULL,
bal_acc = FALSE,
f1_metric = FALSE,
auc_metric = FALSE,
label_smoothing = 0,
verbose = TRUE,
model_seed = NULL,
mixed_precision = FALSE,
mirrored_strategy = NULL
)
maxlen |
Length of predictor sequence. |
vocabulary_size |
Number of unique character in vocabulary. |
embed_dim |
Dimension for token embedding. No embedding if set to 0. Should be used when input is not one-hot encoded (integer sequence). |
pos_encoding |
Either |
head_size |
Dimensions of attention key. |
num_heads |
Number of attention heads. |
ff_dim |
Units of first dense layer after attention blocks. |
dropout |
Vector of dropout rates after attention block(s). |
n |
Frequency of sine waves for positional encoding. Only applied if |
layer_dense |
Vector specifying number of neurons per dense layer after last LSTM or CNN layer (if no LSTM used). |
dropout_dense |
Dropout for dense layers. |
flatten_method |
How to process output of last attention block. Can be |
last_layer_activation |
Activation function of output layer(s). For example |
loss_fn |
Either |
solver |
Optimization method, options are |
learning_rate |
Learning rate for optimizer. |
label_noise_matrix |
Matrix of label noises. Every row stands for one class and columns for percentage of labels in that class. If first label contains 5 percent wrong labels and second label no noise, then
|
bal_acc |
Whether to add balanced accuracy. |
f1_metric |
Whether to add F1 metric. |
auc_metric |
Whether to add AUC metric. |
label_smoothing |
Float in [0, 1]. If 0, no smoothing is applied. If > 0, loss between the predicted labels and a smoothed version of the true labels, where the smoothing squeezes the labels towards 0.5. The closer the argument is to 1 the more the labels get smoothed. |
verbose |
Boolean. |
model_seed |
Set seed for model parameters in tensorflow if not |
mixed_precision |
Whether to use mixed precision (https://www.tensorflow.org/guide/mixed_precision). |
mirrored_strategy |
Whether to use distributed mirrored strategy. If NULL, will use distributed mirrored strategy only if >1 GPU available. |
A keras model implementing transformer architecture.
maxlen <- 50
library(keras)
model <- create_model_transformer(maxlen = maxlen,
head_size=c(10,12),
num_heads=c(7,8),
ff_dim=c(5,9),
dropout=c(0.3, 0.5))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.