optimizer_radam: Rectified Adam (a.k.a. RAdam)
In tfaddons: Interface to 'TensorFlow SIG Addons'

Rectified Adam (a.k.a. RAdam)

optimizer_radam(
  learning_rate = 0.001,
  beta_1 = 0.9,
  beta_2 = 0.999,
  epsilon = 1e-07,
  weight_decay = 0,
  amsgrad = FALSE,
  sma_threshold = 5,
  total_steps = 0,
  warmup_proportion = 0.1,
  min_lr = 0,
  name = "RectifiedAdam",
  clipnorm = NULL,
  clipvalue = NULL,
  decay = NULL,
  lr = NULL
)

`learning_rate`	A 'Tensor' or a floating point value. or a schedule that is a 'tf$keras$optimizers$schedules$LearningRateSchedule' The learning rate.
`beta_1`	A float value or a constant float tensor. The exponential decay rate for the 1st moment estimates.
`beta_2`	A float value or a constant float tensor. The exponential decay rate for the 2nd moment estimates.
`epsilon`	A small constant for numerical stability.
`weight_decay`	A floating point value. Weight decay for each param.
`amsgrad`	boolean. Whether to apply AMSGrad variant of this algorithm from the paper "On the Convergence of Adam and beyond".
`sma_threshold`	A float value. The threshold for simple mean average.
`total_steps`	An integer. Total number of training steps. Enable warmup by setting a positive value.
`warmup_proportion`	A floating point value. The proportion of increasing steps.
`min_lr`	A floating point value. Minimum learning rate after warmup.
`name`	Optional name for the operations created when applying gradients. Defaults to "RectifiedAdam".
`clipnorm`	is clip gradients by norm.
`clipvalue`	is clip gradients by value.
`decay`	is included for backward compatibility to allow time inverse decay of learning rate.
`lr`	is included for backward compatibility, recommended to use learning_rate instead.