Nonlinear Conjugate Gradient minimization

Share:

Description

NLCG (Nonlinear Conjugate gradient) is intended for minimizing smooth, not necessarily convex, functions.

Usage

1
2
3
4
5
nlcg(fn, gr, nvar = 0, nstart = 10,
      x0 = NULL,upper = 1, lower = 0,  H0 = NULL, maxit = 1000, fvalquit = -Inf,
      prtlevel = 0, version = "C", normtol = 1e-06,
      xnormquit = Inf, evaldist = 1e-04, ngrad = 0, scale = 1,
      wolfe1 = 1e-04, wolfe2 = 0.5, quitLSfail = TRUE, strongwolfe = 1)

Arguments

fn

A function to be minimized. fn(x) takes input as a vector of parameters over which minimization is to take place. fn() returns a scaler.

gr

A function to return the gradient for fn(x).

nvar

Number of parameters that fn() takes.

nstart

Number of initial guesses. Default is 10.

x0

Matrix, with dimension (nvar x nstart), each column represent an initial guess.

upper

upper bound for the initial value

lower

lower bound for the initial value

H0

Initial inverse hessian approximation. Default is NULL

scale

1 to scale H0 at first iteration, 0 otherwise.

maxit

maximum number of iterations.

fvalquit

quit if f drops below this value.

prtlevel

prints messages if this is 1

version

'P' for Polak-Ribiere-Polyak (not recommended: fails on hard problems) 'F' for Fletcher-Reeves (not recommended: often stagnates) 'C' for Polak-Ribiere-Polyak Constrained by Fletcher-Reeves (recommended, combines advantages of 'P' and 'F'; default) 'S' for Hestenes-Stiefel (not recommended) 'Y' for Dai-Yuan (allows weak Wolfe line search, see below) 'Z' for Hager-Zhang '-' for Steepest Descent (for comparison)

normtol

termination tolerance on d: smallest vector in convex hull of up to options.ngrad gradients (default: 1e-6)

xnormquit

quit if norm(x) drops below this.

evaldist

the gradients used in the termination test qualify only if they are evaluated at points approximately within distance options.evaldist of x

ngrad

number of gradients willing to save and use in solving QP to check optimality tolerance on smallest vector in their convex hull; see also next two options

wolfe1

wolfe line search parameter 1.

wolfe2

wolfe line search parameter 2.

quitLSfail

if 1, quits when line search is failed. If 0, continues.

strongwolfe

if 1, strong wolfe line search is used.

Details

NLCG (Nonlinear Conjugate gradient) is intended for minimizing smooth, not necessarily convex, functions. The Fletcher-Reeves version (version='F') is globally convergent in theory but often stagnates in practice. The Polak-Ribiere version (version='P') works better in practice but its search direction may not even be a descent direction and it may not converge. The 'C' version combines the best of both. It is Polak-Ribiere constrained by Fletcher-Reeves, typically behaving as well as or better than PR in practice but with the same global convergence guarantee as FR. The Hestenes-Stiefel version (version='S') also often fails. The Dai-Yuan (version='Y') is the first to allow use of a weak Wolfe line search. The Hager-Zhang (version='Z') is the newest and is also promising.

Value

Returns a list containing the following fields:

x

a matrix with k'th column containing final value of x obtained from k'th column of x0.

f

a vector of final obtained minimum values of fn() at the initial points.

g

each column is the gradient at the corresponding column of x

frec

recorded f values for all iterates

alpharec

record of steps taken in line search

Author(s)

Copyright (c) 2010 Michael Overton for Matlab code and documentation, with permission converted to R by Abhirup Mallik (and Hans W Borchers).

References

Reference: second edition of Nocedal and Wright, Chapter 5, plus papers by Dai-Yuan and Hager-Zhang.

See Also

hanso, gradsamp, bfgs, shor

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
fr <- function(x) {   ## Rosenbrock Banana function
  x1 <- x[1]
  x2 <- x[2]
  100 * (x2 - x1 * x1)^2 + (1 - x1)^2
}
grr <- function(x) { ## Gradient of 'fr'
  x1 <- x[1]
  x2 <- x[2]
  c(-400 * x1 * (x2 - x1 * x1) - 2 * (1 - x1),
    200 *      (x2 - x1 * x1))
}
res=nlcg(fr,grr,nvar=2)
## Following examples are not run, please uncomment them to run.
#x0=matrix(c(0.2,1.3,0.9,.24),2,2)
#res=nlcg(fr,grr,nvar=2,nstart=2,x0=x0,strongwolfe=0) #using weak wolfe