Description Usage Arguments Value References
From Kingma and Ba (2015): "We introduce Adam, an algorithm for firstorder gradientbased optimization of stochastic objective functions, based on adaptive estimates of lowerorder moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for nonstationary objectives and problems with very noisy and/or sparse gradients. The hyperparameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm."
1 2 3 
f 
the function to be minimized, including gradient information
contained in the 
p 
the starting parameters for the minimization. 
x 
covariate matrix with number of rows equal to the number of samples and number of columns equal to the number of variables. 
y 
response column matrix with number of rows equal to the number of samples. 
w 
vector of weights with length equal to the number of samples. 
tau 
vector of desired tauquantile(s) with length equal to the number of samples. 
... 
additional parameters passed to the 
iterlim 
the maximum number of iterations before the optimization is stopped. 
iterbreak 
the maximum number of iterations without progress before the optimization is stopped. 
alpha 
size of the learning rate. 
minibatch 
number of samples in each minibatch. 
beta1 
controls the exponential decay rate used to scale the biased first moment estimate. 
beta2 
controls the exponential decay rate used to scale the biased second raw moment estimate. 
epsilon 
smoothing term to avoid division by zero. 
print.level 
the level of printing which is done during optimization. A value of

A list with elements:
estimate 
The best set of parameters found. 
minimum 
The value of 
Kingma, D.P. and J. Ba, 2015. Adam: A method for stochastic optimization. The International Conference on Learning Representations (ICLR) 2015. http://arxiv.org/abs/1412.6980
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.