adagrad.updater-class: adagrad updater
In davharris/mistnet: Stochastic Neural Networks

Description Details Fields

An updater with adaptive step sizes. Adagrad allows different weights to have different effective learning rates, depending on how much that parameter has moved so far.

__. Following Senior et al. ("An empirical study of learning rates in deep neural networks for speech recognition"), the squared gradients are initialized at K instead of 0. By default, K == 0.1

learning.rate: the learning rate (set to one in the original paper)
squared.grad: a matrix summing the squared gradients over all previous updates
delta: the delta matrix (see updater)