View source: R/kerasOptimizer.R
optimizer_nadam | R Documentation |
Much like Adam is essentially RMSprop with momentum, Nadam is Adam RMSprop with Nesterov momentum.
optimizer_nadam( learning_rate = 0.002, beta_1 = 0.9, beta_2 = 0.999, epsilon = NULL, schedule_decay = 0.004, clipnorm = NULL, clipvalue = NULL, ... )
learning_rate |
float >= 0. Learning rate. |
beta_1 |
The exponential decay rate for the 1st moment estimates. float, 0 < beta < 1. Generally close to 1. |
beta_2 |
The exponential decay rate for the 2nd moment estimates. float, 0 < beta < 1. Generally close to 1. |
epsilon |
float >= 0. Fuzz factor. If 'NULL', defaults to 'k_epsilon()'. |
schedule_decay |
Schedule deacy. |
clipnorm |
Gradients will be clipped when their L2 norm exceeds this value. |
clipvalue |
Gradients will be clipped when their absolute value exceeds this value. |
... |
Unused, present only for backwards compatability |
Default parameters follow those provided in the paper.
To enable compatibility with the ranges of the learning rates
of the other optimizers, the learning rate learning_rate
is internally mapped to 2 * learning_rate
. That is,
a learning rat of 0.001 will be mapped to 0.002 (which is the default.)
[On the importance of initialization and momentum in deep learning](https://www.cs.toronto.edu/~fritz/absps/momentum.pdf).
Other optimizers:
optimizer_adadelta()
,
optimizer_adagrad()
,
optimizer_adamax()
,
optimizer_adam()
,
optimizer_rmsprop()
,
optimizer_sgd()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.