nlcg: Nonlinear Conjugate Gradient minimization
In rHanso: An R Implementation of Hybrid Algorithm for Non-Smooth Optimization (HANSO)

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/nlcg.R

NLCG (Nonlinear Conjugate gradient) is intended for minimizing smooth, not necessarily convex, functions.

nlcg(fn, gr, nvar = 0, nstart = 10,
      x0 = NULL,upper = 1, lower = 0,  H0 = NULL, maxit = 1000, fvalquit = -Inf,
      prtlevel = 0, version = "C", normtol = 1e-06,
      xnormquit = Inf, evaldist = 1e-04, ngrad = 0, scale = 1,
      wolfe1 = 1e-04, wolfe2 = 0.5, quitLSfail = TRUE, strongwolfe = 1)

`fn`	A function to be minimized. fn(x) takes input as a vector of parameters over which minimization is to take place. fn() returns a scaler.
`gr`	A function to return the gradient for fn(x).
`nvar`	Number of parameters that fn() takes.
`nstart`	Number of initial guesses. Default is 10.
`x0`	Matrix, with dimension (nvar x nstart), each column represent an initial guess.
`upper`	upper bound for the initial value
`lower`	lower bound for the initial value
`H0`	Initial inverse hessian approximation. Default is NULL
`scale`	1 to scale H0 at first iteration, 0 otherwise.
`maxit`	maximum number of iterations.
`fvalquit`	quit if f drops below this value.
`prtlevel`	prints messages if this is 1
`version`	'P' for Polak-Ribiere-Polyak (not recommended: fails on hard problems) 'F' for Fletcher-Reeves (not recommended: often stagnates) 'C' for Polak-Ribiere-Polyak Constrained by Fletcher-Reeves (recommended, combines advantages of 'P' and 'F'; default) 'S' for Hestenes-Stiefel (not recommended) 'Y' for Dai-Yuan (allows weak Wolfe line search, see below) 'Z' for Hager-Zhang '-' for Steepest Descent (for comparison)
`normtol`	termination tolerance on d: smallest vector in convex hull of up to options.ngrad gradients (default: 1e-6)
`xnormquit`	quit if norm(x) drops below this.
`evaldist`	the gradients used in the termination test qualify only if they are evaluated at points approximately within distance options.evaldist of x
`ngrad`	number of gradients willing to save and use in solving QP to check optimality tolerance on smallest vector in their convex hull; see also next two options
`wolfe1`	wolfe line search parameter 1.
`wolfe2`	wolfe line search parameter 2.
`quitLSfail`	if 1, quits when line search is failed. If 0, continues.
`strongwolfe`	if 1, strong wolfe line search is used.

NLCG (Nonlinear Conjugate gradient) is intended for minimizing smooth, not necessarily convex, functions. The Fletcher-Reeves version (version='F') is globally convergent in theory but often stagnates in practice. The Polak-Ribiere version (version='P') works better in practice but its search direction may not even be a descent direction and it may not converge. The 'C' version combines the best of both. It is Polak-Ribiere constrained by Fletcher-Reeves, typically behaving as well as or better than PR in practice but with the same global convergence guarantee as FR. The Hestenes-Stiefel version (version='S') also often fails. The Dai-Yuan (version='Y') is the first to allow use of a weak Wolfe line search. The Hager-Zhang (version='Z') is the newest and is also promising.

Returns a list containing the following fields:

`x`	a matrix with k'th column containing final value of x obtained from k'th column of x0.
`f`	a vector of final obtained minimum values of fn() at the initial points.
`g`	each column is the gradient at the corresponding column of x
`frec`	recorded f values for all iterates
`alpharec`	record of steps taken in line search

Reference: second edition of Nocedal and Wright, Chapter 5, plus papers by Dai-Yuan and Hager-Zhang.

hanso, gradsamp, bfgs, shor

fr <- function(x) {   ## Rosenbrock Banana function
  x1 <- x[1]
  x2 <- x[2]
  100 * (x2 - x1 * x1)^2 + (1 - x1)^2
}
grr <- function(x) { ## Gradient of 'fr'
  x1 <- x[1]
  x2 <- x[2]
  c(-400 * x1 * (x2 - x1 * x1) - 2 * (1 - x1),
    200 *      (x2 - x1 * x1))
}
res=nlcg(fr,grr,nvar=2)
## Following examples are not run, please uncomment them to run.
#x0=matrix(c(0.2,1.3,0.9,.24),2,2)
#res=nlcg(fr,grr,nvar=2,nstart=2,x0=x0,strongwolfe=0) #using weak wolfe