An updater with adaptive step sizes. Adam allows different weights to have different effective learning rates, depending on how much that parameter has moved so far and on how much it has moved recently in one consistent direction.
a_0initial step size; default is 0.01
annealing_ratecontrols the step size at time t. Step size is
a[t] = a_0 / sqrt(1 - annealing_rate + t*annealing_rate).
Default is 0.001.
b1exponential decay rate for first moment estimate; default is 0.9
b2exponential decay rate for second moment estimate; default is 0.999
eepsilon (prevents divide-by-zero errors); default is 1E-8
mfirst moment estimates; all zero by default at initialization
vsecond moment estimates; all zero by default at initialization
ttimestep; zero by default at initialization
deltathe delta matrix (see updater)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.