optim_qhadam: QHAdam optimization algorithm

optim_qhadamR Documentation

QHAdam optimization algorithm

Description

R implementation of the QHAdam optimizer proposed by Ma and Yarats(2019). We used the implementation available at https://github.com/jettify/pytorch-optimizer/blob/master/torch_optimizer/qhadam.py. Thanks to Nikolay Novik for providing the pytorch code.

The original implementation has been developed by Facebook AI and is licensed using the MIT license.

From the the paper by Ma and Yarats(2019): QHAdam is a QH augmented version of Adam, where we replace both of Adam's moment estimators with quasi-hyperbolic terms. QHAdam decouples the momentum term from the current gradient when updating the weights, and decouples the mean squared gradients term from the current squared gradient when updating the weights.

Usage

optim_qhadam(
  params,
  lr = 0.01,
  betas = c(0.9, 0.999),
  eps = 0.001,
  nus = c(1, 1),
  weight_decay = 0,
  decouple_weight_decay = FALSE
)

Arguments

params

List of parameters to optimize.

lr

Learning rate (default: 1e-3)

betas

Coefficients computing running averages of gradient and its square (default: (0.9, 0.999))

eps

Term added to the denominator to improve numerical stability (default: 1e-8)

nus

Immediate discount factors used to estimate the gradient and its square (default: (1.0, 1.0))

weight_decay

Weight decay (L2 penalty) (default: 0)

decouple_weight_decay

Whether to decouple the weight decay from the gradient-based optimization step.

Value

A torch optimizer object implementing the step method.

Author(s)

Gilberto Camara, gilberto.camara@inpe.br

Daniel Falbel, daniel.falble@gmail.com

Rolf Simoes, rolf.simoes@inpe.br

Felipe Souza, lipecaso@gmail.com

Alber Sanchez, alber.ipia@inpe.br

References

Jerry Ma, Denis Yarats, "Quasi-hyperbolic momentum and Adam for deep learning". https://arxiv.org/abs/1810.06801

Examples

if (torch::torch_is_installed()) {
# function to demonstrate optimization
beale <- function(x, y) {
    log((1.5 - x + x * y)^2 + (2.25 - x - x * y^2)^2 + (2.625 - x + x * y^3)^2)
 }
# define optimizer
optim <- torchopt::optim_qhadam
# define hyperparams
opt_hparams <- list(lr = 0.01)

# starting point
x0 <- 3
y0 <- 3
# create tensor
x <- torch::torch_tensor(x0, requires_grad = TRUE)
y <- torch::torch_tensor(y0, requires_grad = TRUE)
# instantiate optimizer
optim <- do.call(optim, c(list(params = list(x, y)), opt_hparams))
# run optimizer
steps <- 400
x_steps <- numeric(steps)
y_steps <- numeric(steps)
for (i in seq_len(steps)) {
    x_steps[i] <- as.numeric(x)
    y_steps[i] <- as.numeric(y)
    optim$zero_grad()
    z <- beale(x, y)
    z$backward()
    optim$step()
}
print(paste0("starting value = ", beale(x0, y0)))
print(paste0("final value = ", beale(x_steps[steps], y_steps[steps])))
}


torchopt documentation built on June 7, 2023, 6:10 p.m.