These are notes describing a proposal for "optimization objects", a format to unify the inputs and outputs from optimization functions.
Annotations: JN: means John Nash has commented. JN??: he has a query or wants comments. Similarly for other commentators.
Currently base R has many different optimizers (optim()
, nls()
, nlm()
, etc.)
each with different strengths, but also each with different requirements for
inputs. The optimx
and optimz
packages are an attempt to unify these
(and others) into a common interface, to make comparison of the results easier.
This proposal carries that unification further.
The optimx()
(JN: suggest optimobj()
or other different name
to avoid confusion for current packages)
function should accept as input an optimization object. The
object should contain sufficient information about the optimization problem
to allow choice of an optimizer and construction of defaults for all necessary
arguments for that optimizer.
Thus
ans <- optimobj(myoptopject) (JN:)
The result ans
should also be an optimization object. In addition to the inputs,
the result should contain information about the outcome of the optimization
attempt, and perhaps a history of previous attempts. (There needs to be
some control over the level of detail here, but it should at least be possible
to contain (JN??: obtain. I think npar=25000 with full trace will blow most R
installations. It gives a very interesting demo, as I discovered to my embarrassment,
of the swap disk going berserk.) full trace information from previous attempts.)
I am thinking about this as if the objects are implemented in S3, but this proposal is preliminary, so all of the suggestions are tentative, including the form of the objects.
The current idea is that there should be two kinds of objects:
objectiveFn
objects which represent an objective function to optimize,
and optimization
objects which represent an optimization problem, including
the objective function, strategy for optimizing, and a record of what
has been attempted so far.
JN: Thus,
anoptobj <- list(objectiveFn=myobjfn, optInputs=myoptinputs, optOut=NULL)
where the NULL gets replaced with "stuff". And maybe the name is unfortunate.
We distinguish between two views of the objective function: the user's view, and the optimizer's view. For example, the user may have a function of 5 variables, but wish to fix two of them, so the optimizer's view is of a function of 3 variables. Or the user may have a partially linear model, so the nonlinear optimizer only needs to modify the nonlinear parameters, and the linear parameters can then by estimated by linear regression.
JN: I'm very much in agreement with the spirit of this, and very much worried how difficult it will be to do nicely. Still, we've got to try.
Fields in the optimization object. These are modelled on the arguments
to the stats::optim
function; other optimizers may need others, so
these can change.
JN: I've reordered these
objectiveFn
fn
: An objective function object.optInput
par
: A vector of parameter values, used to initialize the optimizer.method
: A vector of names of optimization methods to try. (JN: I use this singular for
historical reasons -- i.e., optim(), but we could change it.)lower
, upper
: Bounds on the allowable range of the parameters.control
: A list of control parameters. (JN: This can be tricky to set up "nicely".)dots
: A list of other parameters to pass to the objective function.optOut
results
: A list of results of previous attempts. JN: It would be good to figure out
the structure of this, especially how we deal with detail that a full trajectory would
entail.optStatus
various informational items that report whether
Post solution checks on proposed solutions
The lower
, upper
, control
, and hessian
arguments can be lists,
with entries corresponding to the matching entry in method
. Currently
these are named with matching names to methods, but perhaps this should
be done as a vector, so the same method can be included multiple times
with different control
values.
JN: The default control list is now about a page long. Many controls are common. However, it is important for "follow-on" cases as Duncan's example below suggests (optimx already has this option, but the code is not as clean as I'd like.) Because of the complexity the use of lists here involves, I believe we need to think carefully how we set up computations. The examples provide some suggestions, and we can use more to help us understand what would work well.
JN: I regard different bounds as defining different problems. So I think lower
and
upper
should only appear once.
JN: Computing the Hessian is VERY expensive when npar gets large. I'd suggest that it should only be calculated for the "best" result unless the user takes special steps. The user has the option of calling optimHess() in current optim(). It is actually documented there, but I think too succinctly, and I'd argue for a documentation patch, or at least an example. We may also want to prepare a function that computes it and puts the result in the right place in the optOut list.
JN: A confusion issue is that hessian is a LOGICAL input and a MATRIX output. This could be unimportant if we use the optInput and optOut categories.
JN: Duncan's example, as I note below is a follow-on polyalgorithm. Sometimes this is what we want. Other times we want to compare methods. So we are trying to do two different things with the list of methods, and this gets us into trouble, especially when we need separate controls for each of the components of a polyalgorithm.
This object contains the following functions. Typically they would share an environment where other data could be stored.
fn0
: A function(par, ...)
which is the
user's view of the function.fn1
: A function(par1, ...)
which is the
optimizer's view of the function.gr0
and gr1
: Corresponding functions to compute the gradients.hess0
and hess1
JN??forward
: A function(par)
that converts par
to par1
.inverse
: A function(par1)
that converts par1
to par
.JN: My experience is that it's (a lot) easier to work with the original length parameter vector. Yes, the function has fewer true parameters, but the transformation is tedious and error prone. It is fairly straightforward for the software to deal with the loss of dimensionality, since a "mask" that has 0 in each place where the parameter is fixed and 1 elsewhere will project the change vector appropriately. This is used in the bounded methods as the parameters move and hit and leave bounds. The transformation on each iteration would be very onerous. Since we already have a structure for masking the constrained parameters, adding fixed parameters is relatively straightforward.
Issue: You DO need to know the dimensionality internally. ntrue = 0 is trivial -- no work. ntrue = 1 can mess up some algorithms e.g., conjugate gradients. I suspect Nelder-Mead will do silly things too. However, we use a hyperbolic transformation to do bounds in nmkb, but Rcgmin, Rvmmin and Rtnmin all use index vectors and seem to manage the dimensionality when they hit upper or lower bounds, as these reduce dimensionality too.
JN: A different "internal" function is one that minimizes to compute a maximum by multiplying user's objective by -1. There is also parameter scaling, and I believe that this could be made efficient by use of a scaling vector passed through a local environment. This already exists in Rtnmin, as my brother used Matlab globals to pass historical points and function values and gradients between internal routines. I believe it could be used more generally to hold useful information about the function.
optim()
likes fn and gr as separate entities. However, it is quite common
to find that the function and gradient code share a large proportion of their
calculations, so having them separate is wasteful. However, three logical
controls in a special environment for optimz / nls14 could be getf, getg,
and geth. In optimx and optimz there are conditions so that in the following
table, the top three possibilities are allowed and the bottom 5 are not.
getf getg geth T T T T T F T F F ================ T F T F T F F T T F F T F F F
Different optimization methods take different approaches. However, it is
more common that one needs a new evaluation of the function AND gradient at
a given set of parameters than just the gradient. Only in methods like
trust
do we need function, gradient and Hessian. On the other hand,
there are plenty of times when even a gradient method will ask only for
the function value. Less common is the need for gradient alone. Methods
such as Rtnmin
and L-BFGS-B
always calculate function and gradient
together.
nlm
can optionally compute the gradient and hessian
as well as the function value, but puts the evaluated gradient and
hessian into attributes of the returned object, whereas trust
returns them in a list.
JN: I think the idea of a single R function with multiple returns is attractive, but we need to figure out which method is best.
A simple example suggests a small difference in favour of the attribute approach:
source("~/R-optimtest/testfnsR/grosenbrock/grose.R") grose.h <- function(x, gs){ return(NULL) } lfn <- function(x, ...){ # list based function fv <- grose.f(x, ...) gv <- grose.g(x, ...) hv <- grose.h(x, ...) fgh <- list(value=fv, gradient=gv, hessia=hv) } afn <- function(x, ...){ # attribute based function fv <- grose.f(x, ...) gv <- grose.g(x, ...) hv <- grose.h(x, ...) attribute(fv,"gradient") <- gv attribute(fv, "hessian") <- hv fv } require(microbenchmark) xx<-rep(pi, 10) microbenchmark(lfn) microbenchmark(afn) afgh <- function(x) { gr <- function(x1, x2) c(-400*x1*(x2 - x1*x1) - 2*(1-x1), 200*(x2 - x1*x1)) h <- function(x1, x2) { a11 <- 2 - 400*x2 + 1200*x1*x1 a21 <- -400*x1 matrix(c(a11, a21, a21, 200), 2, 2) } x1 <- x[1]; x2 <- x[2] res<- 100*(x2 - x1*x1)^2 + (1-x1)^2 attr(res, "gradient") <- gr(x1, x2) attr(res, "hessian") <- h(x1, x2) return(res) } lfgh <- function(x) { gr <- function(x1, x2) c(-400*x1*(x2 - x1*x1) - 2*(1-x1), 200*(x2 - x1*x1)) h <- function(x1, x2) { a11 <- 2 - 400*x2 + 1200*x1*x1 a21 <- -400*x1 matrix(c(a11, a21, a21, 200), 2, 2) } x1 <- x[1]; x2 <- x[2] res<- 100*(x2 - x1*x1)^2 + (1-x1)^2 lfgh <- list(value=res, gradient=gr, hessian=h) } microbenchmark(lfgh) microbenchmark(afgh)
This is the kind of thing I am thinking of. This only supports optim
; the
real thing needs to be much more elaborate.
simpleObjective <- function(fn, gr = NULL) { identity <- function(par) par structure(list(fn0 = fn, fn1 = fn, gr0 = gr, gr1 = gr, forward = identity, inverse = identity), class = "objectiveFn") } optimization <- function(par, fn, gr = NULL, ..., methods = knownMethods, lower = -Inf, upper = Inf, control = list(), hessian = FALSE) { if (!inherits(fn, "objectiveFn")) fn <- simpleObjective(fn, gr) knownMethods <- c("Nelder-Mead", "BFGS", "CG", "L-BFGS-B", "SANN", "Brent") result <- list(par = par, fn = fn, dots = list(...), lower = lower, upper = upper, control = control, hessian = hessian) if (!missing(methods)) result$methods <- match.arg(methods, knownMethods, several.ok = TRUE) else result$methods <- knownMethods[1] structure(result, class = "optimization") } print.optimization <- function(x, ...) { cat("Optimization object.") if (!is.null(x$value)) { cat(" fn(", paste(format(x$par), collapse=","), ") = ", format(x$value),"\n", sep="") } else cat(" Not run yet.\n") } run.optimization <- function (x, verbose = TRUE) { if (is.null(x$runs)) x$runs <- list() if (is.null(x$value)) x$value <- do.call(x$fn$fn0, c(list(x$par), x$dots)) for (method in x$methods) { args <- unclass(x) args$method <- method args$methods <- NULL args$runs <- NULL args$value <- NULL args$dots <- NULL args$par <- x$fn$forward(args$par) args$gr <- x$fn$gr1 args$fn <- x$fn$fn1 if (is.list(args$lower)) { args$lower <- x$lower[[method]] if (is.null(args$lower)) args$lower <- -Inf } if (is.list(args$upper)) { args$upper <- x$upper[[method]] if (is.null(args$upper)) args$upper <- Inf } if (is.list(args$control) && all(names(args$control) %in% x$methods)) { args$control <- x$control[[method]] if (is.null(args$control)) args$control <- list() } if (is.list(args$hessian)) { args$hessian <- x$hessian[[method]] if (is.null(args$hessian)) args$hessian <- FALSE } if (verbose) cat("Trying method = '", method, "'\n", sep = "") run <- do.call("optim", c(args, x$dots)) x$runs <- c(x$runs, list(run)) if (run$value < x$value) { x$value <- run$value x$par <- x$fn$inverse(run$par) if (verbose) { cat(" ") print(x) } } else if (verbose) cat(" No improvement.\n") } if (verbose) invisible(x) else x }
Here we try it to optimize the function $(x-1)^2 + (x-y-3)^4 + 0.0001*(y-4)^4 + 1$. I use methods SANN followed by Nelder-Mead with a limit of 20 iterations, just so that the problem doesn't get optimized in the very first try.
fn <- function(arg) { x <- arg[1] y <- arg[2] (x-1)^2 + (x-y-3)^4 + 0.0001*(y-4)^4 + 1 } opt <- optimization(par = c(x=0, y=10), fn = fn, methods=c("SANN", "Nelder"), control=list(maxit=20)) opt <- run.optimization(opt) opt <- run.optimization(opt) run.optimization(opt)
fn <- function(arg) { x <- arg[1] y <- arg[2] (x-1)^2 + (x-y-3)^4 + 0.0001*(y-4)^4 + 1 } start <- c(x=0, y=10) start set.seed(12345) # REALLY IMPORTANT! a1s <- optim(start, fn, method="SANN", control=list(maxit=20)) a1s a1n <- optim(start, fn, method="Nelder-Mead", control=list(maxit=20)) a1n a1sn <- optim(a1s$par, fn, method="Nelder-Mead", control=list(maxit=20)) a1sn
Here's the same example, with a transform that fixes $y$ at the value 10.
fixparms <- function(parms, fn, gr = NULL) { if (!inherits(fn, "objectiveFn")) fn <- simpleObjective(fn, gr) fn0 <- fn$fn0 gr0 <- fn$gr0 unfixed <- is.na(parms) forward <- function(par) { par[unfixed] } inverse <- function(par1) { result <- parms result[unfixed] <- par1 result } fn1 <- function(par1, ...) fn0(inverse(par1), ...) if (is.null(gr0)) gr1 <- NULL else gr1 <- function(par1, ...) gr0(inverse(par1), ...)[unfixed] structure(list(fn0 = fn0, gr0 = gr0, fn1 = fn1, gr1 = gr1, forward = forward, inverse = inverse), class = "objectiveFn") } opt <- optimization(par = c(x = -2, y = 10), fn = fixparms(c(NA, 10), fn), methods=c("SANN", "Brent", "BFGS"), lower = list(Brent = -5), upper = list(Brent = 100), control=list(maxit=20)) run.optimization(opt)
This example traces all calls to fn0
, fn1
and the gradients:
tracer <- function(fn, gr = NULL) { if (!inherits(fn, "objectiveFn")) fn <- simpleObjective(fn, gr) fn0 <- function(par, ...) { value <- fn$fn0(par, ...) cat("fn0(", paste(format(par), collapse=","), ") = ", format(value), "\n", sep="") value } fn1 <- function(par1, ...) { value <- fn$fn1(par1, ...) cat("fn1(", paste(format(par1), collapse=","), ") = ", format(value), "\n", sep="") value } if (is.null(fn$gr0)) gr0 <- NULL else gr0 <- function(par, ...) { value <- fn$gr0(par, ...) cat("gr0(", paste(format(par), collapse=","), ") = (", paste(format(value), collapse = ","), ")\n", sep="") value } if (is.null(fn$gr1)) gr1 <- NULL else gr1 <- function(par, ...) { value <- fn$gr1(par, ...) cat("gr1(", paste(format(par), collapse=","), ") = (", paste(format(value), collapse = ","), ")\n", sep="") value } structure(list(fn0 = fn0, gr0 = gr0, fn1 = fn1, gr1 = gr1, forward = fn$forward, inverse = fn$inverse), class = "objectiveFn") } opt <- optimization(par = c(x = -2, y = 10), fn = tracer(fixparms(c(NA, 10), fn)), methods=c("SANN", "Brent", "BFGS"), lower = list(Brent = -5), upper = list(Brent = 10), control=list(maxit=20)) # JN: Brent is solving a different problem from SANN and BFGS # as shown by forcing upper bound to 10 run.optimization(opt)
JN: In the above, it took me a while to realize that the lower and upper bounds only applied to x and only in the Brent problem. How would we specify bounds on parameters 3983 and 4587 in a 10000 parameter problem over spg, Rcgmin and Rtnmin where these parameters have different ranges? What happens if par=c(NA, 10) and fixparms(c(NA,8))? Which fixed value takes precedence. I'm being picky, but we want to make sure things are really clear. I actually like the fixparms() layout. It may be awkward to code, but something like fixparms(list(par[3214]=23, par[123]=.012)) could be easy to understand.
It is helpful to think about how the tools will be used.
The existing optimx()
function is designed to make it easy to run a suite of methods
on a single problem. optimx
also runs checks on the bounds, the nature of the input
parameters, and the results of computing the gradient by the user's function as well
as a numerical approximation. After finishing, the KKT tests are computed, and a table of
results is available for display in several ways.
optimx
also envisages running a single polyalgorithm by taking the result parameters
from the first listed method and starting the next method with these. It allows for
the number of "iterations" in each method to be specified separately. An issue is that
"iterations", "functions" and "gradients" are not entirely parallel measures of effort,
while methods vary by their particular controls.
A use-case that is not currently available is the comparison of several polyalgorithms. For example, we may have two methods, labelled A and B. It would be useful to be able to compare different orderings and different levels of effort within each method. Using "iterations" as our measure of effort, we might want to run 10 A, then 100 B and compare to 100 A, then 10 B, or 100 B then 10 A.
The last use-case above raises the question of how to instruct R to run the polyalgorithms and, especially, how to report and compare the results. It is not that it is particularly difficult -- the run.optimization() lets us do each stage of a polyalgorithm. What is awkward is providing a shorthand for a particular polyalgorithm so we can tabulate the results easily, and then the tedium of gathering these results and displaying them.
The following is intended to help drive the coding as at 2015-5-14.
These are the functions that are likely called by the user, though some others will be exported.
method=methname
, where methname
is a character string for a single method,
sets default controls for that method. (This is NOT yet active!!!)control=list(SANN:temp=34)
to set specific method controls. A small test using
pmatch()
and names(control)
shows this to be feasible, but some care may be needed to make it
work efficiently and in a way that can be easily maintained. I've found that getting controls
even a little wrong (and not checking them) can give bugs that are annoyingly difficult to track
down. maxit=20
for Nelder-Mead and maxit=1000
for SANN.Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.