An updater with adaptive step sizes. Adam allows different weights to have different effective learning rates, depending on how much that parameter has moved so far and on how much it has moved recently in one consistent direction.
a_0
initial step size; default is 0.01
annealing_rate
controls the step size at time t
. Step size is
a[t] = a_0 / sqrt(1 - annealing_rate + t*annealing_rate)
.
Default is 0.001.
b1
exponential decay rate for first moment estimate; default is 0.9
b2
exponential decay rate for second moment estimate; default is 0.999
e
epsilon (prevents divide-by-zero errors); default is 1E-8
m
first moment estimates; all zero by default at initialization
v
second moment estimates; all zero by default at initialization
t
timestep; zero by default at initialization
delta
the delta matrix (see updater
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.