optim_nadam: Nadam optimizer

optim_nadamR Documentation

Nadam optimizer

Description

R implementation of the Nadam optimizer proposed by Dazat (2016).

From the abstract by the paper by Dozat (2016): This work aims to improve upon the recently proposed and rapidly popularized optimization algorithm Adam (Kingma & Ba, 2014). Adam has two main components—a momentum component and an adaptive learning rate component. However, regular momentum can be shown conceptually and empirically to be inferior to a similar algorithm known as Nesterov’s accelerated gradient (NAG).

Usage

optim_nadam(
  params,
  lr = 0.002,
  betas = c(0.9, 0.999),
  eps = 1e-08,
  weight_decay = 0,
  momentum_decay = 0.004
)

Arguments

params

List of parameters to optimize.

lr

Learning rate (default: 1e-3)

betas

Coefficients computing running averages of gradient and its square (default: (0.9, 0.999)).

eps

Term added to the denominator to improve numerical stability (default: 1e-8).

weight_decay

Weight decay (L2 penalty) (default: 0).

momentum_decay

Momentum_decay (default: 4e-3).

Value

A torch optimizer object implementing the step method.

Author(s)

Gilberto Camara, gilberto.camara@inpe.br

Rolf Simoes, rolf.simoes@inpe.br

Felipe Souza, lipecaso@gmail.com

Alber Sanchez, alber.ipia@inpe.br

References

Timothy Dozat, "Incorporating Nesterov Momentum into Adam", International Conference on Learning Representations (ICLR) 2016. https://openreview.net/pdf/OM0jvwB8jIp57ZJjtNEZ.pdf

Examples

if (torch::torch_is_installed()) {
# function to demonstrate optimization
beale <- function(x, y) {
    log((1.5 - x + x * y)^2 + (2.25 - x - x * y^2)^2 + (2.625 - x + x * y^3)^2)
 }
# define optimizer
optim <- torchopt::optim_nadam
# define hyperparams
opt_hparams <- list(lr = 0.01)

# starting point
x0 <- 3
y0 <- 3
# create tensor
x <- torch::torch_tensor(x0, requires_grad = TRUE)
y <- torch::torch_tensor(y0, requires_grad = TRUE)
# instantiate optimizer
optim <- do.call(optim, c(list(params = list(x, y)), opt_hparams))
# run optimizer
steps <- 400
x_steps <- numeric(steps)
y_steps <- numeric(steps)
for (i in seq_len(steps)) {
    x_steps[i] <- as.numeric(x)
    y_steps[i] <- as.numeric(y)
    optim$zero_grad()
    z <- beale(x, y)
    z$backward()
    optim$step()
}
print(paste0("starting value = ", beale(x0, y0)))
print(paste0("final value = ", beale(x_steps[steps], y_steps[steps])))
}

torchopt documentation built on June 7, 2023, 6:10 p.m.