knitr::opts_chunk$set(echo = TRUE, collapse = TRUE, comment = "#>") library(mize)
By "Stateful" I mean what if we could create an optimizer independently of the function it was operating on and be able to pass it around, store it, and get full control over when we pass it data to continue the optimization.
This vignette is about using mize
to manually control an optimization
externally. Instead of passing mize
a function to be optimized from a
starting point, then waiting for mize
to finish and get back the finished
results, you might want to tell mize
to optimize for a few steps, then do
something with the intermediate results: log the cost, update some parameters,
test for some specific convergence criterion, checkpoint the current results,
or plot the current state of the result in some custom way. Then, if there's
still more optimization to be done, pass the results back off to mize
and
get it to crank away for a few more iterations.
This was in fact the inspiration for creating mize
in the first place: I
wanted access to the sort of optimization routines that the stats::optim
function provided, but the lack of control was a deal breaker. One way to
try and get around the problem is to only optimize for a few iterations at a
time:
rb_fg <- list( fn = function(x) { 100 * (x[2] - x[1] * x[1]) ^ 2 + (1 - x[1]) ^ 2 }, gr = function(x) { c( -400 * x[1] * (x[2] - x[1] * x[1]) - 2 * (1 - x[1]), 200 * (x[2] - x[1] * x[1])) }) rb0 <- c(-1.2, 1) par <- rb0 for (batch in 1:3) { optim_res <- stats::optim(par = par, fn = rb_fg$fn, gr = rb_fg$gr, method = "BFGS", control = list(maxit = 10)) par <- optim_res$par message("batch ", batch, " f = ", formatC(optim_res$value)) }
but even this unsatisfactory work-around causes problems, because you are reinitializing the optimization for with each batch, and losing all the information the optimizer has. In the case of methods like BFGS and CG, this is important for their efficient use. The more control you want, the fewer iterations per batch, but that leads to behavior that approaches steepest descent.
Instead, mize
lets you create a stateful optimizer, that you pass to a
function, and an updated version of which is returned as part of the return
value of the function. This gives you complete control over what to do in
between iterations, without sacrificing any of the information the optimizer
is using.
To create an optimizer, use the make_mize
function:
opt <- make_mize(method = "BFGS")
Before starting the optimization, the optimizer needs to be initialized using the function and starting point. Mainly this is to allow the various methods to preallocate whatever storage they make use of (matrices and vectors) according to the size of the data, as specified by the starting location.
To continue the rosenbrock example from above:
opt <- mize_init(opt = opt, par = rb0, fg = rb_fg)
If you have both the starting point and the function to optimize to hand at
the point when the optimizer is created, you can provide that to make_mize
and it will do the initialization for you:
opt <- make_mize(method = "BFGS", par = rb0, fg = rb_fg, max_iter = 30)
And there is no need to make a separate call to mize_init
. However,
normally it's more convenient to handle configuring the optimizer earlier
than when the data shows up.
Using the batch of ten iteration approach we used with optim
is very similar
with mize
:
par <- rb0 iter <- 0 for (batch in 1:3) { for (i in 1:10) { mize_res <- mize_step(opt = opt, par = par, fg = rb_fg) par <- mize_res$par opt <- mize_res$opt } message("batch ", batch, " f = ", formatC(mize_res$f)) }
The difference here is that you have to do the iterating in batches of 10
manually yourself, remembering to increment the iteration counter and pass it
to mize_step
. Plus, the optimizer needs to be updated with the version that
was returned from the function.
mize_step
As you can see, with the greater power of mize_step
to control the iteration,
comes greater responsibility. You also need to decide when to stop iterating.
Apart from par
and opt
, there are some other components to the
returned result list which might help:
f
- The function value, if it was calculated at par
. For the few methods
which don't do this, you can of course generate it yourself via rb_fg$fn(par)
.g
- The gradient vector, if it was calculated at par
. If it's not present,
then obviously there's nothing to stop you calculating rb_fg$gr(par)
yourself.nf
- The number of function evaluations carried out so far (i.e. since
initialization). opt
is also keeping track of this, and coordinates with
mize_step
, so you don't need to manually update this yourself between steps.ng
- The number of gradient evaluations carried out so far.You should treat the optimizer, opt
, as a black box and not examine its
horrific innards, except to check whether opt$error
is non-NULL
. If it's
anything other than NULL
, then this means something really bad has happened
during the optimization, most likely a NaN
or Inf
was calculated in
the gradient. This can happen with a very poorly chosen starting point, and
a combination of descent method and line search which doesn't guarantee descent,
such as a very aggressive momentum scheme or more likely an adaptive learning
rate technique like delta-bar-delta. Monitoring the function value or the
size of the change in par
between iterations can help spot an imminent
divergence.
Taking all that into account, here's a self-contained example, that removes the now un-necessary batching, does some minor error checking, and keeps track of the best parameters seen so far (although with this combination of optimizer and problem, you don't have to worry about it):
# Create the optimizer opt <- make_mize(method = "BFGS") # Pretend we don't have access to the function or starting point until later rb_fg <- list( fn = function(x) { 100 * (x[2] - x[1] * x[1]) ^ 2 + (1 - x[1]) ^ 2 }, gr = function(x) { c( -400 * x[1] * (x[2] - x[1] * x[1]) - 2 * (1 - x[1]), 200 * (x[2] - x[1] * x[1])) }) rb0 <- c(-1.2, 1) # Initialize opt <- mize_init(opt = opt, par = rb0, fg = rb_fg) # Store the best seen parameters in case something goes wrong par <- rb0 par_best <- par f_best <- rb_fg$fn(par_best) for (i in 1:30) { mize_res <- mize_step(opt = opt, par = par, fg = rb_fg) par <- mize_res$par opt <- mize_res$opt # Do whatever you want with the data at each iteration if (opt$is_terminated) { # Something bad happened break } if (mize_res$f < f_best) { f_best <- mize_res$f par_best <- par } } # optimized result is in par_best par_best f_best
The return value of mize_step
provides the function and gradient values. If
you would like access to more information, the mize_step_summary
function
can extract it conveniently:
# Create optimizer and do one step of optimization as usual opt <- make_mize(method = "BFGS", par = rb0, fg = rb_fg) par <- rb0 mize_res <- mize_step(opt = opt, par = par, fg = rb_fg) step_info <- mize_step_summary(mize_res$opt, mize_res$par, rb_fg, par_old = par) # info that's already available in mize_res step_info$f step_info$ng step_info$nf # and some extra step_info$step step_info$alpha
mize_step_summary
takes opt
, par
and fg
like mize_step
does, but
also optionally wants a par_old
argument. This is the value of par
from the previous iteration, from which it calculates the size of the step
taken in this iteration.
Information available from the return value of mize_step_summary
includes:
iter
The iteration number.f
The function value, if it's available, or if you have set a convergence
tolerance that requires its calculation (see below).g2n
The gradient l2 (Euclidean) norm, if grad_tol
is non-NULL
(see the
Convergence section for more).ginfn
The gradient infinity norm, if ginf_tol
is non-NULL
(also see the
Convergence section for more).nf
The number of function evaluations so far over the course of the
optimization.ng
The number of gradient evaluations so far over the course of the
optimization.step
The step size of this iteration.alpha
The size of the line search value found during the gradient descent
stage. This won't be the same as step
even for optimizers that don't use an
extra momentum stage because the total step size is normally the value of
alpha
multiplied by the magnitude of the gradient.mu
If a momentum stage was used, the momentum coefficient.opt
The optimizer with updated function and gradient counts, if f
,
g2n
, ginfn
was calculated.In many cases, f
, g2n
and ginfn
do not require any recalculation (or
aren't calculated), but to be on the safe side, always reassign opt
to the
return value from mize_step_summary
.
Here's a modified version of the previous example, where we log out information
from mize_step_summary
. We're only going to go for 10 iterations to avoid
too much output.
# Create the optimizer opt <- make_mize(method = "BFGS", par = rb0, fg = rb_fg) par <- rb0 for (i in 1:10) { par_old <- par mize_res <- mize_step(opt = opt, par = par, fg = rb_fg) par <- mize_res$par opt <- mize_res$opt # step info step_info <- mize_step_summary(opt, par, rb_fg, par_old) opt <- step_info$opt message(paste( Map(function(x) { paste0(x, " = ", formatC(step_info[[x]])) }, c("iter", "f", "nf", "ng", "step")), collapse = ", ")) }
In the example up until now we have manually looped over 30 iterations and then stopped. More sophisticated stopping criteria is available. Three changes are needed:
par
and fg
to either
make_mize
or mize_init
, also pass termination criteria:opt <- make_mize(method = "BFGS", par = rb0, fg = rb_fg, max_iter = 30) # or opt <- make_mize(method = "BFGS") opt <- mize_init(opt = opt, par = rb0, fg = rb_fg, max_iter = 30)
mize_step_summary
, pass the return
value to the function mize_check_convergence
. This returns an updated version
of opt
which will indicate if optimization should stop by setting the
opt$is_terminated
boolean flag:step_info <- mize_step_summary(opt, par, rb_fg, par_old) opt <- check_mize_convergence(step_info)
Note that you don't need to manually assign opt
to the value that comes from
mize_step_summary
, as check_mize_convergence
handles that.
for
loop you can use
while (!opt$is_terminated)
.Once opt$is_terminated
is TRUE
, you can find out what caused the
optimization by looking at opt$terminate$what
. We were using
opt$is_terminated
before now, where if it was set to TRUE
it meant that
something awful had occurred, like infinity or NaN in a gradient.
check_mize_convergence
also uses this flag, but now with an expanded
meaning that just indicates optimization should cease, but not necessarily
because a catastrophe occurred. It's still worth checking if opt$is_terminated
was set by mize_step
if anything that you do in the rest of the loop
assumes that the gradient or function value is finite (e.g. comparing it
to a real number in a boolean condition).
Apart from just maximum number of iterations, there are a variety of options
that relate to convergence. There is a separate vignette which covers these
convergence options, and all the parameters mentioned there
can be passed to make_mize
and mize_init
. Whatever options you use,
setting max_iter
is a good idea to avoid an infinite loop.
Here's the example repeated again, this time using check_mize_convergence
to control the number of iterations, rather than a for
loop:
# Create the optimizer opt <- make_mize(method = "BFGS") rb_fg <- list( fn = function(x) { 100 * (x[2] - x[1] * x[1]) ^ 2 + (1 - x[1]) ^ 2 }, gr = function(x) { c( -400 * x[1] * (x[2] - x[1] * x[1]) - 2 * (1 - x[1]), 200 * (x[2] - x[1] * x[1])) }) rb0 <- c(-1.2, 1) # Initialize and set convergence criteria opt <- mize_init(opt = opt, par = rb0, fg = rb_fg, max_iter = 30) # Store the best seen parameters in case something goes wrong par <- rb0 par_best <- par f_best <- rb_fg$fn(par_best) while (!opt$is_terminated) { mize_res <- mize_step(opt = opt, par = par, fg = rb_fg) par <- mize_res$par opt <- mize_res$opt # Do whatever you want with the data at each iteration if (opt$is_terminated) { # Something bad happened break } if (mize_res$f < f_best) { f_best <- mize_res$f par_best <- par } step_info <- mize_step_summary(opt, par, rb_fg, par_old) # Do something with the step info if you'd like # Check convergence opt <- check_mize_convergence(step_info) } # optimized result is in par_best par_best f_best
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.