adagrad.updater-class: adagrad updater

Description Details Fields

Description

An updater with adaptive step sizes. Adagrad allows different weights to have different effective learning rates, depending on how much that parameter has moved so far.

Details

__. Following Senior et al. ("An empirical study of learning rates in deep neural networks for speech recognition"), the squared gradients are initialized at K instead of 0. By default, K == 0.1

Fields

learning.rate

the learning rate (set to one in the original paper)

squared.grad

a matrix summing the squared gradients over all previous updates

delta

the delta matrix (see updater)


davharris/mistnet documentation built on May 14, 2019, 9:28 p.m.