Description Usage Arguments Value Examples
View source: R/weight_decay_optimizers.R
This is an implementation of the AdamW optimizer described in "Decoupled Weight Decay Regularization" by Loshchilov & Hutter (https://arxiv.org/abs/1711.05101) ([pdf])(https://arxiv.org/pdf/1711.05101.pdf). It computes the update step of tf.keras.optimizers.Adam and additionally decays the variable. Note that this is different from adding L2 regularization on the variables to the loss: it regularizes variables with large gradients more than L2 regularization would, which was shown to yield better training loss and generalization error in the paper above.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
weight_decay |
A Tensor or a floating point value. The weight decay. |
learning_rate |
A Tensor or a floating point value. The learning rate. |
beta_1 |
A float value or a constant float tensor. The exponential decay rate for the 1st moment estimates. |
beta_2 |
A float value or a constant float tensor. The exponential decay rate for the 2nd moment estimates. |
epsilon |
A small constant for numerical stability. This epsilon is "epsilon hat" in the Kingma and Ba paper (in the formula just before Section 2.1), not the epsilon in Algorithm 1 of the paper. |
amsgrad |
boolean. Whether to apply AMSGrad variant of this algorithm from the paper "On the Convergence of Adam and beyond". |
name |
Optional name for the operations created when applying |
clipnorm |
is clip gradients by norm. |
clipvalue |
is clip gradients by value. |
decay |
is included for backward compatibility to allow time inverse decay of learning rate. |
lr |
is included for backward compatibility, recommended to use learning_rate instead. |
Optimizer for use with 'keras::compile()'
1 2 3 4 5 6 7 8 9 10 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.