Spectral projected gradient method for large-scale optimization with simple constraints.

1 2 3 4 |

`par` |
A real vector argument to |

`fn` |
Nonlinear objective function that is to be optimized. A scalar function that takes a real vector as argument and returns a scalar that is the value of the function at that point (see details). |

`gr` |
The gradient of the objective function |

`method` |
An integer (1, 2, or 3) specifying which Barzilai-Borwein steplength to use. The default is 3. See *Details*. |

`upper` |
An upper bound for box constraints. |

`lower` |
An lower bound for box constraints. |

`project` |
A projection
function or character string indicating its name. The projection
function takes a point in |

`projectArgs` |
A list with arguments to the |

`control` |
A list of control parameters. See *Details*. |

`quiet` |
A logical variable (TRUE/FALSE). If |

`alertConvergence` |
A logical variable. With the default |

`...` |
Additional arguments passed to |

R adaptation, with significant modifications, by Ravi Varadhan, Johns Hopkins University (March 25, 2008), from the original FORTRAN code of Birgin, Martinez, and Raydan (2001). The original is available at the TANGO project http://www.ime.usp.br/~egbirgin/tango/downloads.php

A major modification in our R adaptation of the original FORTRAN code is the
availability of 3 different options for Barzilai-Borwein (BB)
steplengths: `method = 1`

is the BB
steplength used in Birgin, Martinez and Raydan (2000); `method = 2`

is
the other steplength proposed in Barzilai and Borwein's (1988) original paper.
Finally, `method = 3`

, is a new steplength, which was first proposed in
Varadhan and Roland (2008) for accelerating the EM algorithm.
In fact, Varadhan and Roland (2008) considered 3 similar steplength schemes in
their EM acceleration work. Here, we have chosen `method = 3`

as the "default" method. This method may be slightly slower than the
other 2 BB steplength schemes, but it generally exhibited more reliable
convergence to a better optimum in our experiments.
We recommend that the user try the other steplength schemes if the default
method does not perform well in their problem.

Box constraints can be imposed by vectors `lower`

and `upper`

.
Scalar values for `lower`

and `upper`

are expanded to apply to
all parameters. The default `lower`

is `-Inf`

and `upper`

is `+Inf`

, which imply no constraints.

The `project`

argument provides a way to implement more general constraints
to be imposed on the parameters in `spg`

. `projectArgs`

is passed
to the `project`

function if one is specified. The first argument of any `project`

function should be `par`

and any other arguments should be passed using its argument `projectArgs`

.
To avoid confusion it is suggested that user defined `project`

functions should not use arguments `lower`

and `upper`

.

The function `projectLinear`

incorporates linear equalities and
inequalities. This function also provides an example of how other projections
might be implemented.

Argument `control`

is a list specifing any changes to default values of
algorithm control parameters. Note that the names of these must be
specified completely. Partial matching will not work.
The list items are as follows:

- M
A positive integer, typically between 5-20, that controls the monotonicity of the algorithm.

`M=1`

would enforce strict monotonicity in the reduction of L2-norm of`fn`

, whereas larger values allow for more non-monotonicity. Global convergence under non-monotonicity is ensured by enforcing the Grippo-Lampariello-Lucidi condition (Grippo et al. 1986) in a non-monotone line-search algorithm. Values of`M`

between 5 to 20 are generally good. The default is`M = 10`

.- maxit
The maximum number of iterations. The default is

`maxit = 1500`

.- ftol
Convergence tolerance on the absolute change in objective function between successive iterations. Convergence is declared when the change is less than

`ftol`

. Default is`ftol = 1.e-10`

.- gtol
Convergence tolerance on the infinity-norm of projected gradient

`gr`

evaluated at the current parameter. Convergence is declared when the infinity-norm of projected gradient is less than`gtol`

. Default is`gtol = 1.e-05`

.- maxfeval
Maximum limit on the number of function evaluations. Default is

`maxfeval = 10000`

.- maximize
A logical variable indicating whether the objective function is to be maximized. Default is

`maximize = FALSE`

indicating minimization. For maximization (e.g. log-likelihood maximization in statistical modeling), this may be set to`TRUE`

.- trace
A logical variable (TRUE/FALSE). If

`TRUE`

, information on the progress of optimization is printed. Default is`trace = TRUE`

.- triter
An integer that controls the frequency of tracing when

`trace=TRUE`

. Default is`triter=10`

, which means that the objective`fn`

and the infinity-norm of its projected gradient are printed at every 10-th iteration.- eps
A small positive increment used in the finite-difference approximation of gradient. Default is 1.e-07.

- checkGrad
`NULL`

or a logical variable`TRUE/FALSE`

indicating whether to check the provided analytical gradient against a numerical approximation. With the default`NULL`

the gradient is checked if it is estimated to take less than about ten seconds. A warning will be issued in the case it takes longer. The default can be overridden by specifying`TRUE`

or`FALSE`

. It is recommended that this be set to FALSE for high-dimensional problems, after making sure that the gradient is correctly specified, possibly by running once with`TRUE`

specified.- checkGrad.tol
A small positive value use to compare the maximum relative difference between a user supplied gradient gr and the numerical approximation calculated by grad from package numDeriv. The default is 1.e-06. If this value is exceeded then an error message is issued, as it is a reasonable indication of a problem with the user supplied gr. The user can either fix the gr function, remove it so the finite-difference approximation is used, or increase the tolerance so the check passes.

A list with the following components:

`par` |
Parameters that optimize the nonlinear objective function, if convergence is successful. |

`value` |
The value of the objective function at termination. |

`gradient` |
L-infinity norm of the projected gradient of the objective function at termination. If convergence is successful, this should be less than |

`fn.reduction` |
Reduction in the value of the function from its initial value. This is negative in maximization. |

`iter` |
Number of iterations taken by the algorithm. The gradient is evaluated once each iteration, so the number of gradient evaluations will also be equal to |

`feval` |
Number of times the objective |

`convergence` |
An integer code indicating type of convergence. |

`message` |
A text message explaining which termination criterion was used. |

Birgin EG, Martinez JM, and Raydan M (2000): Nonmonotone spectral projected gradient methods on convex sets, *SIAM J Optimization*, 10, 1196-1211.

Birgin EG, Martinez JM, and Raydan M (2001): SPG: software for convex-constrained optimization, *ACM Transactions on Mathematical Software*.

L Grippo, F Lampariello, and S Lucidi (1986), A nonmonotone line search technique for Newton's method, *SIAM J on Numerical Analysis*, 23, 707-716.

M Raydan (1997), Barzilai-Borwein gradient method for large-scale unconstrained minimization problem, *SIAM J of Optimization*, 7, 26-33.

R Varadhan and C Roland (2008), Simple and globally-convergent methods for accelerating the convergence of any EM algorithm, *Scandinavian J Statistics*, doi: 10.1111/j.1467-9469.2007.00585.x.

R Varadhan and PD Gilbert (2009), BB: An R Package for Solving a Large System of Nonlinear Equations and for Optimizing a High-Dimensional Nonlinear Objective Function, *J. Statistical Software*, 32:4, http://www.jstatsoft.org/v32/i04/

`projectLinear`

,
`BBoptim`

,
`optim`

,
`nlm`

,
`sane`

,
`dfsane`

,
`grad`

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 | ```
sc2.f <- function(x){ sum((1:length(x)) * (exp(x) - x)) / 10}
sc2.g <- function(x){ (1:length(x)) * (exp(x) - 1) / 10}
p0 <- rnorm(50)
ans.spg1 <- spg(par=p0, fn=sc2.f) # Default is method=3
ans.spg2 <- spg(par=p0, fn=sc2.f, method=1)
ans.spg3 <- spg(par=p0, fn=sc2.f, method=2)
ans.cg <- optim(par=p0, fn=sc2.f, method="CG") #Uses conjugate-gradient method in "optim"
ans.lbfgs <- optim(par=p0, fn=sc2.f, method="L-BFGS-B") #Uses low-memory BFGS method in "optim"
# Now we use exact gradient.
# Computation is much faster compared to when using numerical gradient.
ans.spg1 <- spg(par=p0, fn=sc2.f, gr=sc2.g)
############
# Another example illustrating use of additional parameters to objective function
valley.f <- function(x, cons) {
n <- length(x)
f <- rep(NA, n)
j <- 3 * (1:(n/3))
jm2 <- j - 2
jm1 <- j - 1
f[jm2] <- (cons[2] * x[jm2]^3 + cons[1] * x[jm2]) * exp(-(x[jm2]^2)/100) - 1
f[jm1] <- 10 * (sin(x[jm2]) - x[jm1])
f[j] <- 10 * (cos(x[jm2]) - x[j])
sum(f*f)
}
k <- c(1.003344481605351, -3.344481605351171e-03)
p0 <- rnorm(30) # number of parameters should be a multiple of 3 for this example
ans.spg2 <- spg(par=p0, fn=valley.f, cons=k, method=2)
ans.cg <- optim(par=p0, fn=valley.f, cons=k, method="CG")
ans.lbfgs <- optim(par=p0, fn=valley.f, cons=k, method="L-BFGS-B")
####################################################################
# Here is a statistical example illustrating log-likelihood maximization.
poissmix.loglik <- function(p,y) {
# Log-likelihood for a binary Poisson mixture
i <- 0:(length(y)-1)
loglik <- y*log(p[1]*exp(-p[2])*p[2]^i/exp(lgamma(i+1)) +
(1 - p[1])*exp(-p[3])*p[3]^i/exp(lgamma(i+1)))
return (sum(loglik) )
}
# Data from Hasselblad (JASA 1969)
poissmix.dat <- data.frame(death=0:9, freq=c(162,267,271,185,111,61,27,8,3,1))
y <- poissmix.dat$freq
# Lower and upper bounds on parameters
lo <- c(0.001,0,0)
hi <- c(0.999, Inf, Inf)
p0 <- runif(3,c(0.2,1,1),c(0.8,5,8)) # randomly generated starting values
ans.spg <- spg(par=p0, fn=poissmix.loglik, y=y, lower=lo, upper=hi,
control=list(maximize=TRUE))
# how to compute hessian at the MLE
require(numDeriv)
hess <- hessian(x=ans.spg$par, poissmix.loglik, y=y)
se <- sqrt(-diag(solve(hess))) # approximate standard errors
``` |

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.

All documentation is copyright its authors; we didn't write any of that.