Line search enforcing weak Wolfe conditions, suitable for minimizing both smooth and nonsmooth functions. Intended to be called from Hanso, but can be used by itself also.

1 2 | ```
linesch_ww(fn, gr, x0, d, fn0 = fn(x0),
gr0 = gr(x0), c1 = 0, c2 = 0.5, fvalquit = -Inf, prtlevel = 0)
``` |

`fn` |
A function to be minimized. fn(x) takes input as a vector of parameters over which minimization is to take place. fn() returns a scaler. |

`gr` |
A function to return the gradient for fn(x). |

`x0` |
initial point |

`d` |
search direction |

`fn0` |
fn(x0) |

`gr0` |
gr(x0) |

`c1` |
Wolfe parameter for the sufficient decrease condition |

`c2` |
c2: Wolfe parameter for the WEAK condition on directional derivative |

`fvalquit` |
quit if f gets below this value. |

`prtlevel` |
prints messages if this is 1 |

The weak Wolfe line search is far less complicated that the standard strong Wolfe line search that is discussed in many texts. It appears to have no disadvantages compared to strong Wolfe when used with Newton or BFGS methods on smooth functions, and it is essential for the application of BFGS or bundle to nonsmooth functions as done in HANSO. However, it is NOT recommended for use with conjugate gradient methods, which require a strong Wolfe line search for convergence guarantees. Weak Wolfe requires two conditions to be satisfied: sufficient decrease in the objective, and sufficient increase in the directional derivative (not reduction in its absolute value, as required by strong Wolfe).

There are some subtleties for nonsmooth functions. In the typical case that the directional derivative changes sign somewhere along d, it is no problem to satisfy the 2nd condition, but descent may not be possible if the change of sign takes place even when the step is tiny. In this case it is important to return the gradient corresponding to the positive directional derivative even though descent was not obtained. On the other hand, for some nonsmooth functions the function decrease is steady along the line until at some point it jumps to infinity, because an implicit constraint is violated. In this case, the first condition is satisfied but the second is not. All cases are covered by returning the end points of an interval [alpha, beta] and returning the function value at alpha, but the gradients at both alpha and beta.

The assertion that [alpha,beta] brackets a point satisfying the weak Wolfe conditions depends on an assumption that the function f(x + td) is a continuous and piecewise continuously differentiable function of t, and that in the unlikely event that f is evaluated at a point of discontinuity of the derivative, g'*d, where g is the computed gradient, is either the left or right derivative at the point of discontinuity, or something in between these two values.

For functions that are known to be nonsmooth, setting the second Wolfe parameter to zero makes sense, especially for a bundle method, and for the Shor R-algorithm, for which it is essential. However, it's not a good idea for BFGS, as for smooth functions this may prevent superlinear convergence, and it can even make trouble for BFGS on, e.g., f(x) = x_1^2 + eps |x_2|, when eps is small.

returns a list containing:

`alpha` |
steplength satisfying weak Wolfe conditions if one was found, otherwise left end point of interval bracketing such a point (possibly 0) |

`xalpha` |
x0 + alpha*d |

`falpha` |
f(x0 + alpha d) |

`gradalpha` |
(grad f)(x0 + alpha d) |

`fail` |
0 if both Wolfe conditions satisfied, or falpha < fvalquit 1 if one or both Wolfe conditions not satisfied but an interval was found bracketing a point where both satisfied -1 if no such interval was found, function may be unbounded below |

`beta` |
same as alpha if it satisfies weak Wolfe conditions, otherwise right end point of interval bracketing such a point (inf if no such finite interval found) |

`gradbeta` |
(grad f)(x0 + beta d) (this is important for bundle methods) (vector of nans if beta is inf) |

`fevalrec` |
record of function evaluations |

Copyright (c) 2010 Michael Overton for Matlab code and documentation, with permission converted to R by Abhirup Mallik (and Hans W Borchers).

1 2 3 4 5 6 7 8 9 10 11 12 13 |

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.