A Quick Demo of trustOptim

knitr::opts_chunk$set(collapse = TRUE, comment = "#", message=FALSE)
options(digits=4, scipen=0)

This is a quick demo of how to use the trustOptim package. For this example, the objective function is the Rosenbrock function. $$ f(x_{1:N},y_{1:N})=\sum_{i=1}^N \left[100\left(x^2_i-y_i\right)^2+\left(x_i-1\right)^2\right] $$

The parameter vector contains $2N$ variables ordered as $x_1, y_1, x_2, y_2, ... x_n, y_n$. The optimum of the function is a vector of ones, and the value at the minimum is zero.

The following functions return the objective, gradient, and Hessian (in sparse format) of this function.

require(trustOptim)
require(Matrix)
f <- function(V) {

    N <- length(V)/2
    x <- V[seq(1,2*N-1,by=2)]
    y <- V[seq(2,2*N,by=2)]
    return(sum(100*(x^2-y)^2+(x-1)^2))
}

df <- function(V) {
    N <- length(V)/2
    x <- V[seq(1,2*N-1,by=2)]
    y <- V[seq(2,2*N,by=2)]

    t <- x^2-y
    dxi <- 400*t*x+2*(x-1)
    dyi <- -200*t
    return(as.vector(rbind(dxi,dyi)))
 }

hess <- function(V) {

    N <- length(V)/2
    x <- V[seq(1,2*N-1,by=2)]
    y <- V[seq(2,2*N,by=2)]
    d0 <- rep(200,N*2)
    d0[seq(1,(2*N-1),by=2)] <- 1200*x^2-400*y+2
    d1 <- rep(0,2*N-1)
    d1[seq(1,(2*N-1),by=2)] <- -400*x

    H <- bandSparse(2*N,
                    k=c(-1,0,1),
                    diagonals=list(d1,d0,d1),
                    symmetric=FALSE,
                    repr='C')
    return(drop0(H))
}

For this demo, we start at a random vector.

set.seed(1234)
N <- 3
start <- as.vector(rnorm(2*N, -1, 3))

Next, we call trust.optim, with all default arguments.

opt <- trust.optim(start, fn=f, gr=df, hs=hess, method="Sparse")

In the above output, f is the objective function, and nrm_gr is the norm of the gradient. The status messages illustrate how the underlying trust region algorithm is progressing, and are useful mainly for debugging purposes. Note that the objective value is non-increasing at each iteration, but the norm of the gradient is not. The algorithm will continue until either the norm of the gradient is less than the control parameter prec, the trust region radius is less than stop.trust.radius, or the iteration count exceeds maxit. See the package manual for details of the control parameters. We use the default control parameters for this demo (hence, there is no control list here.

The result contains the objective value, the minimum, the gradient at the minimum (should be numerically zero for all elements), and the Hessian at the minimum.

opt

Note that opt$fval, and all elements of opt$gradient are zero, within machine precision. The solution is correct, and the Hessian is returned as a compressed sparse Matrix object (refer to the Matrix package for details).

One way to potentially speed up convergence (but not necessarily compute time) is to apply a preconditioner to the algorithm. Other than the identity matrix (the default), the package current supports only a modified Cholesky preconditioner. This is implemented with a control parameter preconditioner=1. To save space, we report the optimizer status only ever 10 iterations.

opt1 <- trust.optim(start, fn=f, gr=df, hs=hess, method="Sparse",
      control=list(preconditioner=1, report.freq=10))

Here, we see that adding the preconditioner actually increases the number of iterations. Sometimes preconditioners help a lot, and sometimes not at all.

Quasi-Newton Methods

The trust.optim function also supports quasi-Newton approximations to the Hessian. The two options are BFGS and SR1 updates. See @NocedalWright2006 for details. You do not need to provide the Hessian for these methods, and they are often preferred when the Hessian is dense. However, they may take longer to converge, which is why we need to change the maxit control parameter. To save space, we report the status of the optimizer only every 10 iterations.

opt.bfgs <- trust.optim(start, fn=f, gr=df, method="BFGS", control=list(maxit=5000, report.freq=10))
opt.bfgs

And we can do the same thing with SR1 updates.

opt.sr1 <- trust.optim(start, fn=f, gr=df, method="SR1", control=list(maxit=5000, report.freq=10))
opt.sr1

Note that the quasi_Newton updates do not return a Hessian. We do not think that the final approximations from BFGS or SR1 updates are particularly reliable. If you need the Hessian, you can use the sparseHessianFD package.

References



Try the trustOptim package in your browser

Any scripts or data that you put into this service are public.

trustOptim documentation built on Oct. 11, 2021, 9:07 a.m.