adadelta.updater-class: adadelta updater

Description Details Fields

Description

An updater with adaptive step sizes, like adagrad. Adadelta modifies adagrad (see adagrad.updater) by decaying the squared gradients and multiplying by an extra term to keep the units consistent. Some evidence indicates that adadelta is more robust

Details

See Zeiler 2012 ADADELTA: AN ADAPTIVE LEARNING RATE METHOD http://www.matthewzeiler.com/pubs/googleTR2012/googleTR2012.pdf

Fields

rho

a rate (e.g. .95) that controls how long the updater "remembers" the squared magnitude of previous updates. Larger rho (closer to 1) allows the model to retain information from more steps in the past.

epsilon

a small constant (e.g. 1E-6) to prevent numerical instability when dividing by small numbers

squared.grad

a matrix summing the squared gradients over all previous updates, but decayed according to rho.

delta

the delta matrix (see updater)

squared.delta

a matrix summing the squared deltas over all previous updates, but decayed according to rho.


davharris/mistnet documentation built on May 14, 2019, 9:28 p.m.