optimx
is a wrapper package to allow the multiple methods for function minimization
available in R and CRAN packages to be used through a consistent syntax. This vignette
is a significant update of an earlier one that describes some of the issues in building
and extending the package. However, it is intended mainly for developers, including
ourselves, as an aid to and partial record of work done. Thus it is an incomplete and
imperfect document not aimed at users.
NOTE: Some material also appears in the "Intro to optimx" vignette
The base R distribution includes three tools that allow for function minimization,
possibly with bounds constraints. These are optim()
, nlm()
and nlminb()
. There
is also the nonlinear least squares solver nls()
that will not be discussed
here except when there is overlap with particular matters at hand.
Because the syntax for optim()
is relatively convenient and already spans five
different multiparameter minimization methods (or solvers). There is also
method "Brent" in optim()
that is a one-parameter, or one-dimensional, minimizer.
Separately, base R has the one-parameter function optimize()
.
The optim()
syntax, from the manual, is
optim(par, fn, gr = NULL, ..., method = c("Nelder-Mead", "BFGS", "CG", "L-BFGS-B", "SANN", "Brent"), lower = -Inf, upper = Inf, control = list(), hessian = FALSE)
Here par
is the vector of starting parameters for the function fn
which has
an (optional) gradient function gr
. method
specifies the solver to be used.
It actually must be a single character string. An attempt to call two solvers
by, say,
t2 <- optim(par=c(2,2), fn=fr, gr=grr, method=c("BFGS", "Nelder-Mead"))
generates the message
Error in match.arg(method) : 'arg' must be of length 1
The benefits of a unified syntax are seen immediately if we look how nlm()
and
nlminb()
must be called.
nlm(f, p, ..., hessian = FALSE, typsize = rep(1, length(p)), fscale = 1, print.level = 0, ndigit = 12, gradtol = 1e-6, stepmax = max(1000 * sqrt(sum((p/typsize)^2)), 1000), steptol = 1e-6, iterlim = 100, check.analyticals = TRUE)
nlminb(start, objective, gradient = NULL, hessian = NULL, ..., scale = 1, control = list(), lower = -Inf, upper = Inf)
The reader can query the help for these functions (?nlm
or ?nlminb
)
to get the meaning of the arguments. Here I will note particularly
the names, order and structure of the parameters and objective function are inconsistent across the three tools;
the controls have different names and meanings and are sometimes arguments in the call, and sometimes elements of a control vector;
the "answer" returned by the three tools is similar but not quite interchangeable in structure.
When we extend our interest to CRAN packages, these issues multiply.
Moreover, there are different features of the R tools and CRAN packages that are NOT general. It would be helpful to users if they were consistently available across a wider range of solver functions. Some of these are:
control$parscale
or a similar (but consistent) mechanism;optim()
returns;opm()
;The updated package incorporates the earlier optimx()
function which inspired
the present work. That function uses a slightly different call and returns
a slightly different result structure from the more recent opm()
function.
A number of CRAN and other packages use optimx()
; see @jnrv2009user.
Because we do not wish to cause breakages of working packages and tools,
we have retained the optimx()
function in the current version.
WARNING: only updates to correct egregious errors in optimx()
are
envisaged going forward. Please use optimr()
and opm()
.
It is fully intended that both developers and users may wish to extend
the package, especially by adding solvers to the optimr()
function.
The file
ctrldefault.R
lists the optimizers currently available via this
function in the code optimr.R
.
Note that there are multiple
package lists in the ctrldefault()
function for unconstrained, bounded and masked parameters.
Users can add lists suitable to their needs. Indeed, this vignette
is intended to assist in the the addition of other solvers, as well
perhaps as extra functionality or features.
Ideally, such extensions should be possible for
any R user reasonably proficient in programming R scripts. Changes would, of
course, be made by downloading the source of the package and modifying the R code to
make a new, but only locally available, package. The author does, on the other hand,
welcome suggestions for inclusion in the distributed package, especially if these
have been well-documented and tested. I caution, however, that over 95% of the
effort in building and improving optimx
has been to try to ensure that errors and conflicts are purged, and that the code is
resistant to mistakes in user inputs.
The general nonliear optimization problem considers the minimization or maximization of an objective function subject to equality and inequality constraints that are both linear and nonlinear. It comprehends the field of mathematical programming, and borders on nonlinear equations.
In R, what is referred to as "optimization", including the original optim()
function and its replacement in this package, only allows for the minimization
or maximization of nonlinear functions of
multiple parameters subject to at most bounds constraints. Many methods
offer extra facilities, but apart from masks (fixed parameters) for the
hjn
, Rcgmin
and Rvmmin
methods, such features are
likely inaccessible via this package without some customization
of the code. Even masks are just bounds constraints where the upper and
lower bounds coincide.
For our purposes, we wish to find the
vector of (real valued) parameters bestpar
that minimize an objective
function specified by an
R function fn(par, ... )
where par
is the general
vector of parameters, initially provided as a vector that is either the
first argument to optimr()
or else specified by a par=
argument. The
dot arguments are additional information needed to compute the function.
Function minimization methods may require information on the gradient or
Hessian of the function, which we will assume to be furnished, if required,
by functions gr(par, ...)
and hess(par, ....)
. Bounds or box constraints,
if they are to be imposed, are given in the vectors lower
and upper
.
As far as I am aware, all the optimizers included in the optimx
package are
local minimizers, possibly apart from the "SANN" method of optim()
.
That is, the solvers attempt to find a local minimum of the
objective function. Global optimization is a much bigger problem. Even finding
a local minimum can often be difficult, and to that end, the optimx
package
offers the
function kktchk()
to test the
Kuhn-Karush-Tucker conditions. These essentially
require that a local minumum has a zero gradient and all nearby points on the
function surface have a greater function value. There are many details that
are ignored in this very brief explanation.
optimx()
functionUsers will note that there is much overlap between functions optimx()
and opm()
.
The reasons for the restructuring to optimr()
and opm()
are as follows:
optimx
has a large set of features, which increases the complexity of
maintenance;
opm()
is built upon optimr()
which is designed to call any one of a set
of solvers, so extensions reside largely in the optimr()
function;
there are small but possibly important differences in the result structures
from optimx()
and opm()
. For example, opm()
does not report the number
of "iterations" in the variable niter
because such a value is not returned
by either the original optim()
or newer optimr()
functions.
optimr()
is structured so it can use the original optim()
syntax and
result format, together with the parscale
control allowing parameter scaling
for all solvers. optimx()
only allows parscale
for some solvers.
optimx()
was intended to have features to allow for calling solvers
sequentially to create polyalgorithms, as well as for using multiple vectors
of starting parameters. The combination of such options may cause unexpected
behaviour.
polyopt()
(built on optimr()
) allows for running solvers sequentially
to create a polyalgorithm. For example, one may want to run a number of
cycles of method Nelder-Mead
, followed by the gradient method Rvmmin
.
multistart()
allows multiple starting vectors to be used in a single
call.
optimr()
is an aggregation of wrappers for a number of individual function
minimization ("optimization") tools available for R. The individual
wrappers are selected by a sequence of if()
statements using the argument
method
in the call to optimr()
.
To add a new optimizer, we need in general terms to carry out the following:
import
ed into optimr
;ctrldefault()
function;if()
statement to select the new "method";control
list elements of optimr()
into the corresponding control arguments (possibly not in a list of that name
but in one or more other structures, or even arguments or environment variables)
for the new "method". If necessary, update ctrldefault()
;optimr()
for the nlm()
function, for example);control$parscale
to be applied
if this functionality is not present in the methods. To my knowledge, only the
base optim()
function methods do not need such special scaling provisions.control
items.The Rvmmin
and Rcgmin
packages use the (integer) vector bdmsk
to indicate
the status of each parameter with respect to bounds and masks.
bdmsk[i] = 1
indicates parameter i
is freebdmsk[i] = 0
indicates parameter i
is fixed at its current value (masked)bdmsk[i] = -1
indicates parameter i
is at its upper bound (upper[i]
)bdmsk[i] = -3
indicates parameter i
is at its lower bound (lower[i]
)This particular structure was developed for use with the BASIC programs in
@jnmws87, primarily for active set methods, and returned in the bdmsk
list item
of the answer for
Rvmmin()
and Rcgmin()
. These need to know the direction
in which a step can be taken for each parameter. If the gradient component i
is positive, then the step needs to be a negative one, which it can only be if
there is no lower bound in the way. A free parameter is clearly fine, or one that
is on an upper bound. Adding 2 to bdmsk[i]
must be positive, but if it is negative,
we are at a lower bound and cannot move. Similarly we can use a negative gradient
component and step forward from a lower bound. The codes Rvmminb.R
and Rcgminb.R
provide details, which are fairly straightforward, but tedious.
I believe that the same structure could be ported to other gradient-based codes,
for example, snewton
or snewtonm
.
A number of methods support the inclusion of box (or bounds) constraints. This
includes the function nmkb()
from package dfoptim
. Unfortunately, this method
uses the transfinite transformation of the objective function to impose the bounds
(Chapter 11 of @nlpor14), which causes an error if any of the initial parameters
are on one of the bounds.
There are several improvements in the optimr
package relating to
bounds that would be especially nice to
see, but I do not have good ideas yet how to implement them all. Among these
unresolved improvements are:
The methods hjn
, Rcgmin
and Rvmmin
(and possibly others, but not obviously accessible
yet via this package) also permit fixed (masked) parameters. See @jnmws87. This is
useful when we want
to establish an objective function where one or more of the parameters is supplied
a value in most situations, or for which we want to fix a value while we optimize
the other parameters. At another time, we may want to allow such parameters to
be part of the optimization process.
In principle, we could fix parameters in methods that allow bounds constraints by
simply setting the lower and upper bounds equal for the parameters to be masked.
As a computational approach, this is generally a very bad idea, but in the present
optimr()
this is permitted as a way to signal that a parameter is fixed. Internally
a different representation is used.
The method hjn
is a Hooke and Jeeves axial search method that allows masks and bounds.
It is coded using explicit loops, so will generally be much slower than an implementation
(e.g., hjkb
from dfoptim
or the similar code in package pracma
) that try to
employ vectorized computations. hjn
was included in optimr
to provide
an example of a direct search method with masks. I do NOT recommend it for general use.
There is a possibility that masks could be implemented globally in optimr.
mskd <- which(lower >= upper)
idx <- 1:length(par)
, then `idx <- idx[which(lower<=upper)] are parameters that
take part in the optimizationidx
can be passed into the working function and gradient routines efn
and egr
in the same way scaling is performed.At the time of writing (2016-7-11) this has yet to be tried. However, we have got a test of the transformation.
## test masked transformation # set of 10 parameters par <- 3*(1:10) cat("par:") print(par) cat("Mask 3rd, 5th, 7th\n") bdmsk<-rep(1,10) # indicator of parameters that are free bdmsk[3] <- 0 bdmsk[5] <- 0 bdmsk[7] <- 0 cat("bdmsk:") print(bdmsk) # want to produce xpar which are the reduced parameters iactive <- which(bdmsk == 1) cat("iactive (length=",length(iactive),"):") print(iactive) xpar <- par[iactive] cat("xpar:") print(xpar) xpar <- - xpar print("altered xpar:") print(xpar) cat("expand back to newpar\n") newpar <- par newpar[iactive] <- xpar cat("newpar:") print(newpar) # Need to combine with scaling to get full setup for optimr() # Then also think of the transfinite approach for bounds on unconstrained ## Or even for bounds methods but using unconstrained part.
The method nlm()
provides a good example of a situation where the default
fn()
and gr()
are inappropriate to the method to be added to optimr()
.
We need a function that returns not only the function value at the parameters
but also the gradient and possibly the hessian. Don't forget the dot arguments
which are the exogenous data for the function!
nlmfn <- function(spar, ...){ f <- efn(spar, ...) g <- egr(spar, ...) attr(f,"gradient") <- g attr(f,"hessian") <- NULL # ? maybe change later f }
Note that we have defined nlmfn
using the scaled parameters spar
and the
scaled function efn
and gradient egr
. That is, we develop the unified
objective plus gradient AFTER the parameters are scaled.
In the present optimr()
, the definition of nlmfn
is put near the top of
optimr()
and it is always loaded. It is the author's understanding that such
functions will always be loaded/interpreted no matter where they are in the
code of a function. For ease of finding the code, and as a former Pascal programmer,
I have put it near the top, as
the structure can be then shared across several similar optimizers. There are
other methods that compute the objective function and gradient at the same set of
parameters. Though nlm()
can make use of Hessian information, we have chosen
here to omit the computation of the Hessian.
Parameter scaling is a feature of the original optim()
but generally not provided in
many other optimizers. It has been included (at times with some difficulty) in the
optimr()
function. The construct is to provide a vector of scaling factors via the
control
list in the element parscale
.
In the tests of the package, and as an example of the use and utility of scaling, we use the Hobbs weed infestation problem (./tests/hobbs15b.R). This is a nonlinear least squares problem to estimate a three-parameter logistic function using data for 12 periods. This problem has a solution near the parameters c(196, 49, 0.3). In the test, we try starting from c(300, 50, 0.3) and from the much less informed c(1,1,1). In both cases, the scaling lets us find the solution more reliably. The timings and number of function and gradient evaluations are, however, not necessarily improved for the methods that "work" (though these measures are all somewhat unreliable because they may be defined or evaluated differently in different methods -- we use the information returned by the packages rather than insert counters into functions). However, what values of these measures should we apply for a failed method?
As a warning -- having made the mistake myself -- scaling must be applied to bounds when calling a bounds-capable method.
optim()
uses control$fnscale
to "scale" the value of the function or gradient computed by
fn
or gr
respectively. In practice, the only use for this scaling is to convert a
maximization to a minimization. Most of the methods applied are function minimization tools,
so that if we want to maximize a function, we minimize its negative. Some methods
actually have
the possibility of maximization, and include a maximize
control. In these cases having both
fnscale
and maximize
could create a conflict. We check for this in optimr()
and try to
ensure both controls are set consistently. There is a further risk of conflict with the use
of numerical approximation of derivatives, as discussed later.
Because different methods use different control parameters, and may even put them into
arguments rather than the control
list, a lot of the code in optimr()
is purely for
translating or transforming the names and values to achieve the desired result. This is
sometimes not possible precisely. A method which uses control$trace = TRUE
(a logical
element) has only "on" or "off" for controlling output. Other methods use an integer
for this trace
object, or call it something else that is an integer, in which case there
are more levels of output possible.
I have found that it is important to remove (i.e., set NULL) controls that are not used for a method. Moreover, since R can leave objects in the workspace, I find it important to set any unused or unwanted control to NULL both before and after calling a method.
Thus, if print.level
is the desired control, and it more or less matches the optimr()
control$trace
, we need to set
print.level <- control$trace control$trace <- NULL
After the method has run, we may need to reset control$trace
.
There are some programming issues in the package with method controls and these are discussed in a separate section.
When the method we wish to call is not written in R, special care is generally needed
to get a reliable and consistent operation. Typically we call an R routine from optimr()
.
Let us call the interface to this "foreign" language solver myop()
.
Then myop()
will set up and call the underlying optimizer and return (supposedly) an answer.
For FORTRAN programs, @nlpor14, Chapter 18, has some suggestions. Particular issues
concern the dot arguments (ellipsis or ... entries to allow exogenous data for the
objective function and gradient), which can raise difficulties. In package optimr
the interface to the lbfgs
shows one approach, which consolidates the arguments for
lbfgs()
into a list, then converts the list to an environment.
As an attempt to avoid executing unnecessary code, for some years I built separate unconstrained and constrained routines with a common umbrella routine. Hence,
Rvmmin()
calls either Rvmminu()
or Rvmminb()
Rcgmin()
calls either Rcgminu()
or Rcgminb()
tn()
or tnbc()
More recently, with improvements in the just-in-time compiler for R, I am
exploring consolidated codes, e.g., nvm()
and ncg()
though these are
far from fully developed. However, they also are designed to be called from
optimr()
and not run independently, whereas the routines mentioned above
originally were part of independent R packages. The newer codes rely on
optimr()
to perform sanity checks on supplied parameters, bounds and
functions. In particular, bmchk()
is called to check parameters and bounds.
It is often convenient to be able to run multiple
optimization methods on the same function and
gradient. To this end, the function opm()
is supplied. The output of this by
default includes
the KKT tests and other information in a data frame, and there are convenience
methods summary()
and coef()
to allow for display or extraction of results.
opm()
is extremely useful for comparing methods easily. I caution that it is
not an efficient
way to run problems, even though it can be extremely helpful in deciding which
method to apply to a class of problems.
An important use of opm()
is to discover cases where methods fail on particular
problems and initial conditions of parameters and settings.
This has proven over time to help discover weaknesses and bugs in codes for
different methods. If you find such cases, and your code and data can be
rendered as an easily executed example, I strongly recommend posting it
to one of the R lists or communicating with the package maintainers. That
really is one of the few ways that our codes come to be improved.
Function polyopt()
is intended to allow for attempts to optimize a function
by running different methods in sequence. The call to polyopt() differs from
that of optimr()
or opm()
in the following respects:
The method
character argument or character vector is replaced by the
methcontrol
array which has a set of triplets consisting of a method name
(character), a function evaluation count and an iteration count.
the control$maxit
and control$maxfeval
are replaced, if present, with
values from the methcontrol
argument list.
The methods in methcontrol
are executed in the sequence in which they appear. Each
method runs until either the specified number of iterations (typically gradient
evaluations) or function evaluations have been completed,
or termination tests cause the method to be exited, after which the best set of
parameters so far is passed to the next method specified. If there is no further
method, polyopt()
exits.
Polyalgorithms may be useful because some methods such as Nelder-Mead are fairly robust and efficient in finding the region in which a minimum exists, but then very slow to obtain an accurate set of parameters. Gradients at points far from a solution may be such that gradient-based methods do poorly when started far away from a solution, but are very efficient when started "nearby". Caution, however, is recommended. Such approaches need to be tested for particular applications.
For problems with multiple minima, or which are otherwise difficult to solve,
it is sometimes helpful to attempt an optimization from several starting points.
multistart()
is a simple wrapper to allow this to be carried out. Instead of the
vector par
for the starting parameters argument, however, we now have a matrix
parmat
, each row of which is a set of starting parameters.
In setting up this functionality, I chose NOT to allow mixing of multiple
starts with a polyalgorithm or multiple methods. For users really wishing to do
this, I believe the available source codes opm.R
, polyopt.R
and
multistart.R
provide a sufficient base that the required tools can be
fashioned fairly easily.
Different methods take different approaches to counting the computational effort of performing optimizations. Sometimes this can make it difficult to compare methods.
Derivative information is used by many optimization methods. In particular, the gradient is the vector of first derivatives of the objective function and the hessian is its second derivative matrix. It is generally non-trivial to write a function for a gradient, and a huge amount of work to write the hessian function.
While there are derivative-free methods, we may also choose to employ numerical
approximations for derivatives. Some of the optimizers called by optimr
automatically
provide a numerical approximation if the gradient function (typically called gr
) is
not provided or set NULL. However, I believe this is open to abuse and also a source
of confusion, since we may not be informed in the results what approximation has been
used. There is also the question of counting function and gradient evaluations
correctly.
Specifying that numerical gradients are to be used, and which method for such approximation, presents a software interfacing issue. Around 2010 I introduced the idea of making the name of a gradient approximation program a character string e.g., "grfwd" and "grcentral" for forward and central derivative approximations. Direct codes for the gradient are included in the call by name NOT in character form.
This idea has been helpful, but it has raised some concerns.
maximize
control;For example, the package numDeriv
has functions for the gradient and
hessian, and offers three methods, namely a Richardson extrapolation (the
default), a complex step method, and a "simple" method. The last is either
a forward of backward approximation controlled by an extra argument side
to the grad()
function. The complex step method, which offers
essentially analytic precision from a very efficient computation, is
unfortunately only applicable if the objective function has specific
properties. That is, according to the documentation:
This method requires that the function be able to handle complex valued arguments and return the appropriate complex valued result, even though the user may only be interested in the real-valued derivatives. It also requires that the complex function be analytic.
The default method of numDeriv
generally requires multiple evaluations of the
objective function to approximate a derivative. The simpler choices, namely,
the forward, backward and central approximations, require respectively 1, 1,
and 2 (extra) function evaluations for each parameter.
To keep the code straightforward, I decided that if an approximate gradient
is to be used by optimr()
(or by extension the multiple method routine
opm()
), then the user should specify the name of the approximation
routine as a character string in quotations marks. The optimx
package
supplies five gradient approximation functions for this purpose, namely
"grfwd" and "grback" for the forward and backward simple approximations,
"grcentral" for the central approximation, "grnd" for the default Richardson
method via numDeriv
, and "grpracma" for function grad
from package
pracma
. It should be fairly straightforward for a user to copy the
structure of any of these routines and build their own gradient approximation,
but I have not tried to document how to do this.
An example using the complex step derivative would also be useful to include in this vignette. I welcome contributions!
Note As at 2018-7-8 Changcheng Li and John Nash had preliminary success
in using the autodiffr
package at
https://github.com/Non-Contradiction/autodiffr/ to generate gradient,
hessian and jacobian functions via the Julia language automatic differentiation
capabilities. There are some issues of slow performance compared to native-R
functions to compute these collections of derivative information, but the
ease of generation of the functions and the fact that they generate
analytic results i.e., as good as R can compute the information,
allows for some improvements in results and also makes feasible the application
of some Newton-like methods. Unfortunately, rapid changed in Julia functions
have made it very difficult to keep this capability operational.
A nasty inconsistency that has caused me quite a lot of difficulty in developing
and maintaining optimx
is that the use of a character string gr
function,
implying a numerical approximation to the gradient, comes into conflict with
either of the controls maximize=TRUE
or fnscale=-1
(or other negative number).
The issue arises because we need to negate the gr
function when it is analytic,
but the numerical approximation works on the already scaled objective function.
Great care is needed to avoid trouble, especially as individual solvers may also
try to apply scalings.
Most optimization routines perform preliminary checks on the inputs. Clearly
any wrapper function such as optimr()
or opm()
also has such checks.
There is naturally duplication. While this may be minor in time and effort
cost, it would be more elegant to be able to use just one set of checks.
The obvious simple gradient approximations "grfwd" and "grback" compare the
function at a step away from a base point given by par
. Looking
at the code, we have
grfwd <- function(par, userfn, fbase=NULL, env=optsp, ...) { # Forward different gradient approximation eps<-env$deps if (is.null(fbase)) fbase <- userfn(par, ...) # ensure we have function value at par df <- rep(NA, length(par)) teps <- eps * (abs(par) + eps) for (i in 1:length(par)) { dx <- par dx[i] <- dx[i] + teps[i] df[i] <- (userfn(dx, ...) - fbase)/teps[i] } df }
In many solvers, we would generally use the current "best" point for the vector
par
as our base for computing the gradient. Thus if n = length(par)
, we
compute fbase
n
times when we already have the value unless the solver
code is clever enough to provide it. Because other approximations, "grcentral"
in particular, do not use fbase
(even though it appears in the function
specification), avoiding recalculation would require
our solvers to be aware of the particular approximation that has been given in
the call. Nevertheless, it is clear there are opportunities for improved
efficiency.
optimr()
functionGiven the optimr
has been set up to
efn
and egr
that are to be minimized.
This means that internal calls to solvers within optimr()
have
maximize
set to FALSE
and fnscale
set to 1
.egr
if gr
is NULL; this may cause some
methods to failparscale
is applied to the objective function parameters and we
minimize scaled parameters and transform the values back to
the original scale when returning an answer on exiting optimr()
.It is open to discussion wheter we should flag (i.e., explicitly report in the output result or elsehwere) that an approximation is being used for the gradient or hessian.
A test script was written to explore this structure and is
part of the package in inst/doc/
testefnegr.R`.
nlm()
nlm() needs an objective function that has gradient and hessian functions
as attributes. As illustrated in inst/doc/examples/woodtest.R
, this
takes a form like
wood.fgh <- function(x){ # For info only fval <- wood.f(x) gval <- wood.g(x) hval <- wood.h(x) attr(fval,"gradient") <- gval attr(fval,"hessian") <- hval fval }
optimr()
in the packageAs mentioned above, this routine allows a vector of methods to be applied to a given function and (optionally) gradient. The pseudo-method "ALL" (upper case) can be given as a single character string (not in a vector) to run all available methods. A more useful pseudo-method is "MOST" which leaves out some less-useful methods. If bounds are given, the list of methods specified is restricted to the methods that can deal with bounds.
This routine is an attempt to consolidate the function, gradient and scale checks.
This routine provides default values for the control
vector that is applicable
to all the methods for a given size of problem. The single argument to this
function is the number of parameters, which is used to compute values for
termination tolerances and limits on function and gradient
evaluations. However, while I believe the values computed are "reasonable"
in general, for specific problems they may be wildly inappropriate.
ctrldefault()
was developed by recording the different controls of many
functions and packages in a spreadsheet to allow for their recording and
comparison. This spreadsheet, optcontrol.xls
also records the returned
answer structure. It is part of the inst/doc/
collection of
information in the package.
This routine (in file ctrldefault.R
) is intended to allow a compact
display of the current default settings used within optimx
.
optextras
packageOptimization methods share a lot of common infrastructure, and much of this was
collected in my retired optextras
package. (Now in the optimx
package.)
The routines used in the current package are as follows.
This routine, which can be called independently for checking the results of other optimization tools, checks the KKT conditions for a given set of parameters that are thought to describe a local optimum.
These have be discussed above under Derivatives.
These functions are provided to allow for detection of user errors in supplied function, gradient or hessian functions. Though we do not yet use hessians in the optimizers called, it is hoped that eventually they can be incorporated.
fnchk()
is mainly a tool to ensure that the supplied function returns a
finite scalar value when evaluated at the supplied parameters.
The other routines use numerical approximations (from numDeriv
) to check
the derivative functions supplied by the user.
This routine is intended to trap errors in setting up bounds and masks for
function minimization problems. In particular, we are looking for situations
where parameters are outside the bounds or where bounds are impossible to
satisfy (e.g., lower > upper). This routine creates an indicator
vector called bdmsk whose values are 1 for free parameters, 0 for masked
(fixed) parameters, -3 for parameters at their lower bound and -1 for those
at their upper bound. (The particular values are related to a coding trick
for BASIC in the early 1980s that was used in @jnmws87.)
Note that this function allows parameter values outside the bounds to be
shifted to the nearest bound (when shift2bound=TRUE
, which is the default
behaviour).
This routine is an attempt to check if the parameters and bounds are roughly
similar in their scale. Unequal scaling can result in poor outcomes when trying
to optimize functions using derivative free methods that try to search the
parameter space. Note that the attempt to include parameter
scaling for all methods is intended to provide a work-around for such bad
scaling. Because nonlinear functions have scaling that changes across the
parameter domain, scalechk()
should
be treated with caution.
One of the unfortunate, and only partially avoidable, inefficiencies in a
wrapper function such as optimr()
is that there will be duplication of
much of the setup and error-avoidance that
a properly constructed optimization program requires. That is, both the wrapper
and the called programs will have code to accomplish similar goals. Some of
these relate to the following.
Besides these sources of inefficiency, there is a potential cost in both
human effort and program execution if we "specialize" variants of a code.
For example, the separate unconstrained and constrained routines,
with the wrapper calling the appropriate version. Rvmmin
has a top-level
routine to decide between Rvmminu
and Rvmminb
, but optimr()
takes over this selection. A similar choice exists within dfoptim
for the Hooke and Jeeves codes. While previously, I would have chosen to
separate the bounded and unconstrained routines, I am now leaning towards
a combined routine for the Hooke and Jeeves. First, I discovered that
the separation seems to have introduced a bug, since the code was structured
to allow a similar organization for both choices, where possibly a different
structure would have been better adapted for efficient R. Note, however,
that I have not performed appropriate timings to support this conjecture.
Second, I managed to implement a bounds constrained HJ code from
a description in one of my own books in less time than it took to try (not
fully successfully) to correct the code from dfoptim
, in part because the
development and stable versions of the latter are quite different, though
both failed the test function bt.f()
example. Ravi Varadhan
has corrected the codes on CRAN.
Note that this is not a criticism of the creators of dfoptim
. I have made
similar choices myself with other packages. It is challenging to balance
clarity, maintainability, efficiency, and common
structure for a suite of related program codes.
We have already noted that it is important to provide control settings for the
different methods.
This can be a challenge, largely because some methods (or at least their
instantiations in R codes or wrappers to other languages) only permit
selected controls. The ctrldefault
function
provides as many of the controls as possible, and provides them via the
control
list. However, many methods put the contols into separate arguments
of their function call. For example,
the control$maxit
iteration limit is iterlim
for the function nlm()
.
A more subtle difficulty is that the multiple-method wrapper opm()
will generally be called with the same iteration and function count
limits across different solvers. However, we may wish to compare the underlying
codes with their own, rather particular, limits. Again nlm()
provides an
example when we call
the Wood 4-parameter test function starting at w0 <- c(-3,-1,-3,-1)
. The
default iterlim
value in nlm()
is 100, but ctrldefault(npar=4)
returns
a value for maxit
of 1000. This
has consequences, as seen in the following example, which also
shows how to call nlm
through optimr()
and opm()
using the internal
default.
require(optimx) npar <- 4 control<-list(maxit=NULL) ctrl <- ctrldefault(npar) ncontrol <- names(control) nctrl <- names(ctrl) for (onename in ncontrol) { if (onename %in% nctrl) { if (! is.null(control[onename]) || ! is.na(control[onename]) ) ctrl[onename]<-control[onename] } } control <- ctrl # note the copy back! control now has a FULL set of values print(control$maxit) control<-list() ctrl <- ctrldefault(npar) ncontrol <- names(control) nctrl <- names(ctrl) for (onename in ncontrol) { if (onename %in% nctrl) { if (! is.null(control[onename]) || ! is.na(control[onename]) ) ctrl[onename]<-control[onename] } } control <- ctrl # note the copy back! control now has a FULL set of values print(control$maxit)
There are examples and tests within the package. These include
At some future opportunity, I hope to be able to document these tests more fully.
For optimr()
, which runs individual solvers, we need to run the following test combinations
for those methods where the tests are appropriate:
For axsearch(), an example of use. Currently a small example in
tests/bdstest.R`.
?? put example in Rd file.
For bmchk()
?? improve example in Rd file.
For bmstep()
an example in the Rd file is needed. This is intended as an internal
function that computes the maximum allowable step along a search direction (or returns
Inf).
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.