descent_generalized_fista: Descent generalized FISTA (fast iterative shrinkage...

View source: R/function_FISTA.R

descent_generalized_fistaR Documentation

Descent generalized FISTA
(fast iterative shrinkage thresholding algorithm)

Description

descent_generalized_fista takes as input a vector, a gradient function and an evaluation function (and some additional parameters/functions). Then, it iteratively minimizes the tweak vector via FISTA (Beck and Teboulle 2009). Basically, the following equations are used:
# prelimary initialization step
for k = 2,..., maxit:

  • y_vec = NORM.FUN(y_vec)

  • grad = F.GRAD.FUN(y_vec)

  • # find best step size between 0 and learning.rate
    for step.size = 0,..., learning.rate:

    • u = ST.FUN(y_vec - step.size * grad, step.size * lambda)

    • eval = EVAL.FUN(u)

  • # only keep u with minimal eval

  • tweak.old = tweak.vec

  • # if it was a descent step:
    tweak.vec = u

  • # if it was not a descent step:
    tweak.vec = tweak.vec
    #(and restart as suggested in O’Donoghue & Candes (2012))

  • # Nesterov extrapolation:

  • nesterov.direction = tweak.vec - tweak.old

  • # find best extrapolation size bewteen 0 and FACTOR.FUN(k):
    for ne = 0 ,... FACTOR.FUN(k):

    • y_vec = u_vec + ne * nesterov.direction

    • eval = EVAL.FUN(y_vec)

  • # only keep y_vec with minimal eval

  • stop criterion: if the descent in either the gradient or the nesterov step was below a critical value, then stop.

Usage

descent_generalized_fista(
  tweak.vec,
  lambda = 0,
  maxit = 500,
  learning.rate = NA,
  F.GRAD.FUN,
  ST.FUN = "softmax",
  FACTOR.FUN = "nesterov_factor",
  EVAL.FUN,
  NORM.FUN = "norm2",
  line.search.speed = 2,
  cycles = 5,
  save.all.tweaks = FALSE,
  use.restart = TRUE,
  verbose = FALSE,
  NESTEROV.FUN = "positive",
  stop.crit.threshold = 1e-13
)

Arguments

tweak.vec

numeric vector, starting vector for the DTD algorithm

lambda

non-negative float, regularization factor for ST.FUN function.

maxit

integer, maximum number of iterations for the iterative minimization.

learning.rate

float, step size during optimization. If it is NA, the learning rate will be estimated as published by Barzilai & Borwein 1988. Notice, the algorithm adjusts the learning rate during optimization.

F.GRAD.FUN

function with one parameter: vector with same length as tweak.vec, and one return argument: vector with same length as tweak.vec For a given vector, it returns the gradient of the loss-function

ST.FUN

function, with one parameter: vector with same length as tweak.vec, and one return argument: vector with same length as tweak.vec. Implementation of the proximal operator for the chosen Loss-function. Here named soft thresholding function.

FACTOR.FUN

function, with a integer as input, and a float as return value. This function returns the factor of the nesterov correction extrapolates.

EVAL.FUN

function, with one parameter: vector with same length as tweak.vec, and a single float as return. This function evaluates the loss function.

NORM.FUN

function, with one parameter: After each step the tweak vector will be normed/scaled. If no scaling should be done, set NORM.FUN to identity.

line.search.speed

numeric, factor with which the learning rate changes during optimization. If 'line.search.speed' is, e.g., 2, the learning rate gets doubled if the highest 'cycle' led to the best eval score. If the 'cycle = 0' led to the best eval score, it would get halved.

cycles

integer, in each iteration one gradient is calculated. To find the best step size, we do "cycles" steps, and evaluate each of them to find the best step size.

save.all.tweaks

logical, should all tweak vectores during all iterations be stored.

use.restart

logical, restart the algorithm if the update was not a descent step.

verbose

logical, should information be printed to console?

NESTEROV.FUN

function, applied to the nesterov extrapolation.

stop.crit.threshold

numeric value. The change in either the gradient or nesterov step has to be at least 'stop.crit.threshold', or the algorithm will stop.

Details

The gradient and evaluation function both take only one argument, the soft thresholding function two. If your gradient, evaluation or soft thresholding function require more arguments, please write a wrapper.

Value

list, including

  • "Converge", numeric vector, EVAL.FUN in every step

  • "Tweak" numeric vector, the best distinguishing vector

  • "Lambda", numeric value, used λ

  • depending on "save.all.tweaks": "History", numeric matrix, "Tweak" vector of every step


MarianSchoen/DTD documentation built on April 29, 2022, 1:59 p.m.