descent_generalized_fista: Descent generalized FISTA (fast iterative shrinkage...
In MarianSchoen/DTD: Digital Tissue Deconvolution

descent_generalized_fista

R Documentation

Descent generalized FISTA
(fast iterative shrinkage thresholding algorithm)

Description

descent_generalized_fista takes as input a vector, a gradient function and an evaluation function (and some additional parameters/functions). Then, it iteratively minimizes the tweak vector via FISTA (Beck and Teboulle 2009). Basically, the following equations are used:
# prelimary initialization step
for k = 2,..., maxit:

y_vec = NORM.FUN(y_vec)
grad = F.GRAD.FUN(y_vec)
# find best step size between 0 and learning.rate
for step.size = 0,..., learning.rate:
- u = ST.FUN(y_vec - step.size * grad, step.size * lambda)
- eval = EVAL.FUN(u)
# only keep u with minimal eval
tweak.old = tweak.vec
# if it was a descent step:
tweak.vec = u
# if it was not a descent step:
tweak.vec = tweak.vec
#(and restart as suggested in O’Donoghue & Candes (2012))
# Nesterov extrapolation:
nesterov.direction = tweak.vec - tweak.old
# find best extrapolation size bewteen 0 and FACTOR.FUN(k):
for ne = 0 ,... FACTOR.FUN(k):
- y_vec = u_vec + ne * nesterov.direction
- eval = EVAL.FUN(y_vec)
# only keep y_vec with minimal eval
stop criterion: if the descent in either the gradient or the nesterov step was below a critical value, then stop.

Usage

descent_generalized_fista(
  tweak.vec,
  lambda = 0,
  maxit = 500,
  learning.rate = NA,
  F.GRAD.FUN,
  ST.FUN = "softmax",
  FACTOR.FUN = "nesterov_factor",
  EVAL.FUN,
  NORM.FUN = "norm2",
  line.search.speed = 2,
  cycles = 5,
  save.all.tweaks = FALSE,
  use.restart = TRUE,
  verbose = FALSE,
  NESTEROV.FUN = "positive",
  stop.crit.threshold = 1e-13
)

Arguments

`tweak.vec`	numeric vector, starting vector for the DTD algorithm
`lambda`	non-negative float, regularization factor for ST.FUN function.
`maxit`	integer, maximum number of iterations for the iterative minimization.
`learning.rate`	float, step size during optimization. If it is NA, the learning rate will be estimated as published by Barzilai & Borwein 1988. Notice, the algorithm adjusts the learning rate during optimization.
`F.GRAD.FUN`	function with one parameter: vector with same length as tweak.vec, and one return argument: vector with same length as tweak.vec For a given vector, it returns the gradient of the loss-function
`ST.FUN`	function, with one parameter: vector with same length as tweak.vec, and one return argument: vector with same length as tweak.vec. Implementation of the proximal operator for the chosen Loss-function. Here named soft thresholding function.
`FACTOR.FUN`	function, with a integer as input, and a float as return value. This function returns the factor of the nesterov correction extrapolates.
`EVAL.FUN`	function, with one parameter: vector with same length as tweak.vec, and a single float as return. This function evaluates the loss function.
`NORM.FUN`	function, with one parameter: After each step the tweak vector will be normed/scaled. If no scaling should be done, set NORM.FUN to identity.
`line.search.speed`	numeric, factor with which the learning rate changes during optimization. If 'line.search.speed' is, e.g., 2, the learning rate gets doubled if the highest 'cycle' led to the best eval score. If the 'cycle = 0' led to the best eval score, it would get halved.
`cycles`	integer, in each iteration one gradient is calculated. To find the best step size, we do "cycles" steps, and evaluate each of them to find the best step size.
`save.all.tweaks`	logical, should all tweak vectores during all iterations be stored.
`use.restart`	logical, restart the algorithm if the update was not a descent step.
`verbose`	logical, should information be printed to console?
`NESTEROV.FUN`	function, applied to the nesterov extrapolation.
`stop.crit.threshold`	numeric value. The change in either the gradient or nesterov step has to be at least 'stop.crit.threshold', or the algorithm will stop.

Details

The gradient and evaluation function both take only one argument, the soft thresholding function two. If your gradient, evaluation or soft thresholding function require more arguments, please write a wrapper.

Value

list, including

"Converge", numeric vector, EVAL.FUN in every step
"Tweak" numeric vector, the best distinguishing vector
"Lambda", numeric value, used λ
depending on "save.all.tweaks": "History", numeric matrix, "Tweak" vector of every step