optimizer_swa: Stochastic Weight Averaging
In tfaddons: Interface to 'TensorFlow SIG Addons'

Description Usage Arguments Details Value Examples

Stochastic Weight Averaging

optimizer_swa(
  optimizer,
  start_averaging = 0,
  average_period = 10,
  name = "SWA",
  sequential_update = TRUE,
  clipnorm = NULL,
  clipvalue = NULL,
  decay = NULL,
  lr = NULL
)

`optimizer`	The original optimizer that will be used to compute and apply the gradients.
`start_averaging`	An integer. Threshold to start averaging using SWA. Averaging only occurs at start_averaging iters, must be >= 0. If start_averaging = m, the first snapshot will be taken after the mth application of gradients (where the first iteration is iteration 0).
`average_period`	An integer. The synchronization period of SWA. The averaging occurs every average_period steps. Averaging period needs to be >= 1.
`name`	Optional name for the operations created when applying gradients. Defaults to 'SWA'.
`sequential_update`	Bool. If FALSE, will compute the moving average at the same time as the model is updated, potentially doing benign data races. If True, will update the moving average after gradient updates
`clipnorm`	is clip gradients by norm.
`clipvalue`	is clip gradients by value.
`decay`	is included for backward compatibility to allow time inverse decay of learning rate.
`lr`	is included for backward compatibility, recommended to use learning_rate instead.

The Stochastic Weight Averaging mechanism was proposed by Pavel Izmailov et. al in the paper [Averaging Weights Leads to Wider Optima and Better Generalization](https://arxiv.org/abs/1803.05407). The optimizer implements averaging of multiple points along the trajectory of SGD. The optimizer expects an inner optimizer which will be used to apply the gradients to the variables and itself computes a running average of the variables every k steps (which generally corresponds to the end of a cycle when a cyclic learning rate is employed). We also allow the specification of the number of steps averaging should first happen after. Let's say, we want averaging to happen every k steps after the first m steps. After step m we'd take a snapshot of the variables and then average the weights appropriately at step m + k, m + 2k and so on. The assign_average_vars function can be called at the end of training to obtain the averaged_weights from the optimizer.

Optimizer for use with 'keras::compile()'

## Not run: 
opt = tf$keras$optimizers$SGD(learning_rate)
opt = optimizer_swa(opt, start_averaging=m, average_period=k)

## End(Not run)

tfaddons documentation built on July 2, 2020, 2:12 a.m.

tfaddons index

README.md Neural Machine Translation

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

tfaddons
Interface to 'TensorFlow SIG Addons'

optimizer_swa: Stochastic Weight Averaging
In tfaddons: Interface to 'TensorFlow SIG Addons'

Description

Usage

Arguments

Details

Value

Examples

Related to optimizer_swa in tfaddons...

R Package Documentation

Browse R Packages

We want your feedback!

tfaddons Interface to 'TensorFlow SIG Addons'

optimizer_swa: Stochastic Weight Averaging In tfaddons: Interface to 'TensorFlow SIG Addons'

Description

Usage

Arguments

Details

Value

Examples

Related to optimizer_swa in tfaddons...

R Package Documentation

Browse R Packages

We want your feedback!

tfaddons
Interface to 'TensorFlow SIG Addons'

optimizer_swa: Stochastic Weight Averaging
In tfaddons: Interface to 'TensorFlow SIG Addons'