Using and extending the optimx package

Abstract

optimx is a wrapper package to allow the multiple methods for function minimization available in R and CRAN packages to be used through a consistent syntax. This vignette is a significant update of an earlier one that describes some of the issues in building and extending the package. However, it is intended mainly for developers, including ourselves, as an aid to and partial record of work done. Thus it is an incomplete and imperfect document not aimed at users.

NOTE: Some material also appears in the "Intro to optimx" vignette

Background

The base R distribution includes three tools that allow for function minimization, possibly with bounds constraints. These are optim(), nlm() and nlminb(). There is also the nonlinear least squares solver nls() that will not be discussed here except when there is overlap with particular matters at hand.

Because the syntax for optim() is relatively convenient and already spans five different multiparameter minimization methods (or solvers). There is also method "Brent" in optim() that is a one-parameter, or one-dimensional, minimizer. Separately, base R has the one-parameter function optimize().

The optim() syntax, from the manual, is

optim(par, fn, gr = NULL, ...,
      method = c("Nelder-Mead", "BFGS", "CG", "L-BFGS-B", "SANN",
                 "Brent"),
      lower = -Inf, upper = Inf,
      control = list(), hessian = FALSE)

Here par is the vector of starting parameters for the function fn which has an (optional) gradient function gr. method specifies the solver to be used. It actually must be a single character string. An attempt to call two solvers by, say,

t2 <- optim(par=c(2,2), fn=fr, gr=grr, method=c("BFGS", "Nelder-Mead"))

generates the message

Error in match.arg(method) : 'arg' must be of length 1

The benefits of a unified syntax are seen immediately if we look how nlm() and nlminb() must be called.

nlm(f, p, ..., hessian = FALSE, typsize = rep(1, length(p)),
    fscale = 1, print.level = 0, ndigit = 12, gradtol = 1e-6,
    stepmax = max(1000 * sqrt(sum((p/typsize)^2)), 1000),
    steptol = 1e-6, iterlim = 100, check.analyticals = TRUE)
nlminb(start, objective, gradient = NULL, hessian = NULL, ...,
       scale = 1, control = list(), lower = -Inf, upper = Inf)

The reader can query the help for these functions (?nlm or ?nlminb) to get the meaning of the arguments. Here I will note particularly

When we extend our interest to CRAN packages, these issues multiply.

Moreover, there are different features of the R tools and CRAN packages that are NOT general. It would be helpful to users if they were consistently available across a wider range of solver functions. Some of these are:

The updated package incorporates the earlier optimx() function which inspired the present work. That function uses a slightly different call and returns a slightly different result structure from the more recent opm() function. A number of CRAN and other packages use optimx(); see @jnrv2009user. Because we do not wish to cause breakages of working packages and tools, we have retained the optimx() function in the current version.

WARNING: only updates to correct egregious errors in optimx() are envisaged going forward. Please use optimr() and opm().

It is fully intended that both developers and users may wish to extend the package, especially by adding solvers to the optimr() function. The file ctrldefault.R lists the optimizers currently available via this function in the code optimr.R. Note that there are multiple package lists in the ctrldefault() function for unconstrained, bounded and masked parameters. Users can add lists suitable to their needs. Indeed, this vignette is intended to assist in the the addition of other solvers, as well perhaps as extra functionality or features.

Ideally, such extensions should be possible for any R user reasonably proficient in programming R scripts. Changes would, of course, be made by downloading the source of the package and modifying the R code to make a new, but only locally available, package. The author does, on the other hand, welcome suggestions for inclusion in the distributed package, especially if these have been well-documented and tested. I caution, however, that over 95% of the effort in building and improving optimx has been to try to ensure that errors and conflicts are purged, and that the code is resistant to mistakes in user inputs.

"Optimization"

The general nonliear optimization problem considers the minimization or maximization of an objective function subject to equality and inequality constraints that are both linear and nonlinear. It comprehends the field of mathematical programming, and borders on nonlinear equations.

In R, what is referred to as "optimization", including the original optim() function and its replacement in this package, only allows for the minimization or maximization of nonlinear functions of multiple parameters subject to at most bounds constraints. Many methods offer extra facilities, but apart from masks (fixed parameters) for the hjn, Rcgmin and Rvmmin methods, such features are likely inaccessible via this package without some customization of the code. Even masks are just bounds constraints where the upper and lower bounds coincide.

For our purposes, we wish to find the vector of (real valued) parameters bestpar that minimize an objective function specified by an R function fn(par, ... ) where par is the general vector of parameters, initially provided as a vector that is either the first argument to optimr() or else specified by a par= argument. The dot arguments are additional information needed to compute the function. Function minimization methods may require information on the gradient or Hessian of the function, which we will assume to be furnished, if required, by functions gr(par, ...) and hess(par, ....). Bounds or box constraints, if they are to be imposed, are given in the vectors lower and upper.

As far as I am aware, all the optimizers included in the optimx package are local minimizers, possibly apart from the "SANN" method of optim(). That is, the solvers attempt to find a local minimum of the objective function. Global optimization is a much bigger problem. Even finding a local minimum can often be difficult, and to that end, the optimx package offers the function kktchk() to test the Kuhn-Karush-Tucker conditions. These essentially require that a local minumum has a zero gradient and all nearby points on the function surface have a greater function value. There are many details that are ignored in this very brief explanation.

Developments of the optimx() function

Users will note that there is much overlap between functions optimx() and opm(). The reasons for the restructuring to optimr() and opm() are as follows:

How the optimr() function (generally) works

optimr() is an aggregation of wrappers for a number of individual function minimization ("optimization") tools available for R. The individual wrappers are selected by a sequence of if() statements using the argument method in the call to optimr().

To add a new optimizer, we need in general terms to carry out the following:

An internal structure for bounds and masks

The Rvmmin and Rcgmin packages use the (integer) vector bdmsk to indicate the status of each parameter with respect to bounds and masks.

This particular structure was developed for use with the BASIC programs in @jnmws87, primarily for active set methods, and returned in the bdmsk list item of the answer for Rvmmin() and Rcgmin(). These need to know the direction in which a step can be taken for each parameter. If the gradient component i is positive, then the step needs to be a negative one, which it can only be if there is no lower bound in the way. A free parameter is clearly fine, or one that is on an upper bound. Adding 2 to bdmsk[i] must be positive, but if it is negative, we are at a lower bound and cannot move. Similarly we can use a negative gradient component and step forward from a lower bound. The codes Rvmminb.R and Rcgminb.R provide details, which are fairly straightforward, but tedious.

I believe that the same structure could be ported to other gradient-based codes, for example, snewton or snewtonm.

Bounds

A number of methods support the inclusion of box (or bounds) constraints. This includes the function nmkb() from package dfoptim. Unfortunately, this method uses the transfinite transformation of the objective function to impose the bounds (Chapter 11 of @nlpor14), which causes an error if any of the initial parameters are on one of the bounds.

There are several improvements in the optimr package relating to bounds that would be especially nice to see, but I do not have good ideas yet how to implement them all. Among these unresolved improvements are:

Masks (fixed parameters)

The methods hjn, Rcgmin and Rvmmin (and possibly others, but not obviously accessible yet via this package) also permit fixed (masked) parameters. See @jnmws87. This is useful when we want to establish an objective function where one or more of the parameters is supplied a value in most situations, or for which we want to fix a value while we optimize the other parameters. At another time, we may want to allow such parameters to be part of the optimization process.

In principle, we could fix parameters in methods that allow bounds constraints by simply setting the lower and upper bounds equal for the parameters to be masked. As a computational approach, this is generally a very bad idea, but in the present optimr() this is permitted as a way to signal that a parameter is fixed. Internally a different representation is used.

The method hjn is a Hooke and Jeeves axial search method that allows masks and bounds. It is coded using explicit loops, so will generally be much slower than an implementation (e.g., hjkb from dfoptim or the similar code in package pracma) that try to employ vectorized computations. hjn was included in optimr to provide an example of a direct search method with masks. I do NOT recommend it for general use.

There is a possibility that masks could be implemented globally in optimr.

At the time of writing (2016-7-11) this has yet to be tried. However, we have got a test of the transformation.

## test masked transformation

# set of 10 parameters
par <- 3*(1:10)
cat("par:")
print(par)

cat("Mask 3rd, 5th, 7th\n")
bdmsk<-rep(1,10) # indicator of parameters that are free
bdmsk[3] <- 0
bdmsk[5] <- 0
bdmsk[7] <- 0
cat("bdmsk:")
print(bdmsk)

# want to produce xpar which are the reduced parameters
iactive <- which(bdmsk == 1)
cat("iactive (length=",length(iactive),"):")
print(iactive)

xpar <- par[iactive]
cat("xpar:")
print(xpar)
xpar <- - xpar
print("altered xpar:")
print(xpar)

cat("expand back to newpar\n")
newpar <- par
newpar[iactive] <- xpar
cat("newpar:")
print(newpar)

# Need to combine with scaling to get full setup for optimr()

# Then also think of the transfinite approach for bounds on unconstrained
## Or even for bounds methods but using unconstrained part.

Issues in adding a new method

Adjusting the objective function for different methods

The method nlm() provides a good example of a situation where the default fn() and gr() are inappropriate to the method to be added to optimr(). We need a function that returns not only the function value at the parameters but also the gradient and possibly the hessian. Don't forget the dot arguments which are the exogenous data for the function!

  nlmfn <- function(spar, ...){
     f <- efn(spar, ...)
     g <- egr(spar, ...)
     attr(f,"gradient") <- g
     attr(f,"hessian") <- NULL # ? maybe change later
     f
  }

Note that we have defined nlmfn using the scaled parameters spar and the scaled function efn and gradient egr. That is, we develop the unified objective plus gradient AFTER the parameters are scaled.

In the present optimr(), the definition of nlmfn is put near the top of optimr() and it is always loaded. It is the author's understanding that such functions will always be loaded/interpreted no matter where they are in the code of a function. For ease of finding the code, and as a former Pascal programmer, I have put it near the top, as the structure can be then shared across several similar optimizers. There are other methods that compute the objective function and gradient at the same set of parameters. Though nlm() can make use of Hessian information, we have chosen here to omit the computation of the Hessian.

Parameter scaling

Parameter scaling is a feature of the original optim() but generally not provided in many other optimizers. It has been included (at times with some difficulty) in the optimr() function. The construct is to provide a vector of scaling factors via the control list in the element parscale.

In the tests of the package, and as an example of the use and utility of scaling, we use the Hobbs weed infestation problem (./tests/hobbs15b.R). This is a nonlinear least squares problem to estimate a three-parameter logistic function using data for 12 periods. This problem has a solution near the parameters c(196, 49, 0.3). In the test, we try starting from c(300, 50, 0.3) and from the much less informed c(1,1,1). In both cases, the scaling lets us find the solution more reliably. The timings and number of function and gradient evaluations are, however, not necessarily improved for the methods that "work" (though these measures are all somewhat unreliable because they may be defined or evaluated differently in different methods -- we use the information returned by the packages rather than insert counters into functions). However, what values of these measures should we apply for a failed method?

As a warning -- having made the mistake myself -- scaling must be applied to bounds when calling a bounds-capable method.

Function scaling

optim() uses control$fnscale to "scale" the value of the function or gradient computed by fn or gr respectively. In practice, the only use for this scaling is to convert a maximization to a minimization. Most of the methods applied are function minimization tools, so that if we want to maximize a function, we minimize its negative. Some methods actually have the possibility of maximization, and include a maximize control. In these cases having both fnscale and maximize could create a conflict. We check for this in optimr() and try to ensure both controls are set consistently. There is a further risk of conflict with the use of numerical approximation of derivatives, as discussed later.

Modified, unused or unwanted controls

Because different methods use different control parameters, and may even put them into arguments rather than the control list, a lot of the code in optimr() is purely for translating or transforming the names and values to achieve the desired result. This is sometimes not possible precisely. A method which uses control$trace = TRUE (a logical element) has only "on" or "off" for controlling output. Other methods use an integer for this trace object, or call it something else that is an integer, in which case there are more levels of output possible.

I have found that it is important to remove (i.e., set NULL) controls that are not used for a method. Moreover, since R can leave objects in the workspace, I find it important to set any unused or unwanted control to NULL both before and after calling a method.

Thus, if print.level is the desired control, and it more or less matches the optimr() control$trace, we need to set

   print.level <- control$trace
   control$trace <- NULL

After the method has run, we may need to reset control$trace.

There are some programming issues in the package with method controls and these are discussed in a separate section.

Methods in other computing languages

When the method we wish to call is not written in R, special care is generally needed to get a reliable and consistent operation. Typically we call an R routine from optimr(). Let us call the interface to this "foreign" language solver myop().
Then myop() will set up and call the underlying optimizer and return (supposedly) an answer.

For FORTRAN programs, @nlpor14, Chapter 18, has some suggestions. Particular issues concern the dot arguments (ellipsis or ... entries to allow exogenous data for the objective function and gradient), which can raise difficulties. In package optimr the interface to the lbfgs shows one approach, which consolidates the arguments for lbfgs() into a list, then converts the list to an environment.

Using separate unconstrained and bounds constrained functions

As an attempt to avoid executing unnecessary code, for some years I built separate unconstrained and constrained routines with a common umbrella routine. Hence,

More recently, with improvements in the just-in-time compiler for R, I am exploring consolidated codes, e.g., nvm() and ncg() though these are far from fully developed. However, they also are designed to be called from optimr() and not run independently, whereas the routines mentioned above originally were part of independent R packages. The newer codes rely on optimr() to perform sanity checks on supplied parameters, bounds and functions. In particular, bmchk() is called to check parameters and bounds.

Running multiple methods

It is often convenient to be able to run multiple optimization methods on the same function and gradient. To this end, the function opm() is supplied. The output of this by default includes the KKT tests and other information in a data frame, and there are convenience methods summary() and coef() to allow for display or extraction of results.

opm() is extremely useful for comparing methods easily. I caution that it is not an efficient way to run problems, even though it can be extremely helpful in deciding which method to apply to a class of problems.

An important use of opm() is to discover cases where methods fail on particular problems and initial conditions of parameters and settings. This has proven over time to help discover weaknesses and bugs in codes for different methods. If you find such cases, and your code and data can be rendered as an easily executed example, I strongly recommend posting it to one of the R lists or communicating with the package maintainers. That really is one of the few ways that our codes come to be improved.

Polyalgorithms -- multiple methods in sequence

Function polyopt() is intended to allow for attempts to optimize a function by running different methods in sequence. The call to polyopt() differs from that of optimr() or opm() in the following respects:

The methods in methcontrol are executed in the sequence in which they appear. Each method runs until either the specified number of iterations (typically gradient evaluations) or function evaluations have been completed, or termination tests cause the method to be exited, after which the best set of parameters so far is passed to the next method specified. If there is no further method, polyopt() exits.

Polyalgorithms may be useful because some methods such as Nelder-Mead are fairly robust and efficient in finding the region in which a minimum exists, but then very slow to obtain an accurate set of parameters. Gradients at points far from a solution may be such that gradient-based methods do poorly when started far away from a solution, but are very efficient when started "nearby". Caution, however, is recommended. Such approaches need to be tested for particular applications.

Multiple sets of starting parameters

For problems with multiple minima, or which are otherwise difficult to solve, it is sometimes helpful to attempt an optimization from several starting points. multistart() is a simple wrapper to allow this to be carried out. Instead of the vector par for the starting parameters argument, however, we now have a matrix parmat, each row of which is a set of starting parameters.

In setting up this functionality, I chose NOT to allow mixing of multiple starts with a polyalgorithm or multiple methods. For users really wishing to do this, I believe the available source codes opm.R, polyopt.R and multistart.R provide a sufficient base that the required tools can be fashioned fairly easily.

Counting function, gradient and hessian evaluations

Different methods take different approaches to counting the computational effort of performing optimizations. Sometimes this can make it difficult to compare methods.

Derivatives

Derivative information is used by many optimization methods. In particular, the gradient is the vector of first derivatives of the objective function and the hessian is its second derivative matrix. It is generally non-trivial to write a function for a gradient, and a huge amount of work to write the hessian function.

While there are derivative-free methods, we may also choose to employ numerical approximations for derivatives. Some of the optimizers called by optimr automatically provide a numerical approximation if the gradient function (typically called gr) is not provided or set NULL. However, I believe this is open to abuse and also a source of confusion, since we may not be informed in the results what approximation has been used. There is also the question of counting function and gradient evaluations correctly.

Specifying that numerical gradients are to be used, and which method for such approximation, presents a software interfacing issue. Around 2010 I introduced the idea of making the name of a gradient approximation program a character string e.g., "grfwd" and "grcentral" for forward and central derivative approximations. Direct codes for the gradient are included in the call by name NOT in character form.

This idea has been helpful, but it has raised some concerns.

For example, the package numDeriv has functions for the gradient and hessian, and offers three methods, namely a Richardson extrapolation (the default), a complex step method, and a "simple" method. The last is either a forward of backward approximation controlled by an extra argument side to the grad() function. The complex step method, which offers essentially analytic precision from a very efficient computation, is unfortunately only applicable if the objective function has specific properties. That is, according to the documentation:

 This method requires that the function be able to handle complex valued 
 arguments and return the appropriate complex valued result, even though 
 the user may only be interested in the real-valued derivatives.  It also
 requires that the complex function be analytic.

The default method of numDeriv generally requires multiple evaluations of the objective function to approximate a derivative. The simpler choices, namely, the forward, backward and central approximations, require respectively 1, 1, and 2 (extra) function evaluations for each parameter.

To keep the code straightforward, I decided that if an approximate gradient is to be used by optimr() (or by extension the multiple method routine opm()), then the user should specify the name of the approximation routine as a character string in quotations marks. The optimx package supplies five gradient approximation functions for this purpose, namely "grfwd" and "grback" for the forward and backward simple approximations, "grcentral" for the central approximation, "grnd" for the default Richardson method via numDeriv, and "grpracma" for function grad from package pracma. It should be fairly straightforward for a user to copy the structure of any of these routines and build their own gradient approximation, but I have not tried to document how to do this.

An example using the complex step derivative would also be useful to include in this vignette. I welcome contributions!

Note As at 2018-7-8 Changcheng Li and John Nash had preliminary success in using the autodiffr package at https://github.com/Non-Contradiction/autodiffr/ to generate gradient, hessian and jacobian functions via the Julia language automatic differentiation capabilities. There are some issues of slow performance compared to native-R functions to compute these collections of derivative information, but the ease of generation of the functions and the fact that they generate analytic results i.e., as good as R can compute the information, allows for some improvements in results and also makes feasible the application of some Newton-like methods. Unfortunately, rapid changed in Julia functions have made it very difficult to keep this capability operational.

Conflict between numerical gradients and maximization controls

A nasty inconsistency that has caused me quite a lot of difficulty in developing and maintaining optimx is that the use of a character string gr function, implying a numerical approximation to the gradient, comes into conflict with either of the controls maximize=TRUE or fnscale=-1 (or other negative number).

The issue arises because we need to negate the gr function when it is analytic, but the numerical approximation works on the already scaled objective function. Great care is needed to avoid trouble, especially as individual solvers may also try to apply scalings.

Duplications and other efficiency considerations

Most optimization routines perform preliminary checks on the inputs. Clearly any wrapper function such as optimr() or opm() also has such checks. There is naturally duplication. While this may be minor in time and effort cost, it would be more elegant to be able to use just one set of checks.

The obvious simple gradient approximations "grfwd" and "grback" compare the function at a step away from a base point given by par. Looking at the code, we have

grfwd <- function(par, userfn, fbase=NULL, env=optsp, ...) {
   # Forward different gradient approximation
   eps<-env$deps
   if (is.null(fbase)) fbase <- userfn(par, ...)  
   # ensure we have function value at par
   df <- rep(NA, length(par))
   teps <- eps * (abs(par) + eps)
   for (i in 1:length(par)) {
      dx <- par
      dx[i] <- dx[i] + teps[i]
      df[i] <- (userfn(dx, ...) - fbase)/teps[i]
   }
   df
}

In many solvers, we would generally use the current "best" point for the vector par as our base for computing the gradient. Thus if n = length(par), we compute fbase n times when we already have the value unless the solver code is clever enough to provide it. Because other approximations, "grcentral" in particular, do not use fbase (even though it appears in the function specification), avoiding recalculation would require our solvers to be aware of the particular approximation that has been given in the call. Nevertheless, it is clear there are opportunities for improved efficiency.

Structuring the optimr() function

Given the optimr has been set up to

It is open to discussion wheter we should flag (i.e., explicitly report in the output result or elsehwere) that an approximation is being used for the gradient or hessian.

A test script was written to explore this structure and is part of the package in inst/doc/testefnegr.R`.

Special cases such as nlm()

nlm() needs an objective function that has gradient and hessian functions as attributes. As illustrated in inst/doc/examples/woodtest.R, this takes a form like

wood.fgh <- function(x){ # For info only
   fval <- wood.f(x)
   gval <- wood.g(x)
   hval <- wood.h(x)
   attr(fval,"gradient") <- gval
   attr(fval,"hessian") <- hval
   fval
}

Functions besides optimr() in the package

opm()

As mentioned above, this routine allows a vector of methods to be applied to a given function and (optionally) gradient. The pseudo-method "ALL" (upper case) can be given as a single character string (not in a vector) to run all available methods. A more useful pseudo-method is "MOST" which leaves out some less-useful methods. If bounds are given, the list of methods specified is restricted to the methods that can deal with bounds.

optchk()

This routine is an attempt to consolidate the function, gradient and scale checks.

ctrldefault()

This routine provides default values for the control vector that is applicable to all the methods for a given size of problem. The single argument to this function is the number of parameters, which is used to compute values for termination tolerances and limits on function and gradient evaluations. However, while I believe the values computed are "reasonable" in general, for specific problems they may be wildly inappropriate.

ctrldefault() was developed by recording the different controls of many functions and packages in a spreadsheet to allow for their recording and comparison. This spreadsheet, optcontrol.xls also records the returned answer structure. It is part of the inst/doc/ collection of information in the package.

dispdefault()

This routine (in file ctrldefault.R) is intended to allow a compact display of the current default settings used within optimx.

Functions that were formerly in the optextras package

Optimization methods share a lot of common infrastructure, and much of this was collected in my retired optextras package. (Now in the optimx package.) The routines used in the current package are as follows.

kktchk()

This routine, which can be called independently for checking the results of other optimization tools, checks the KKT conditions for a given set of parameters that are thought to describe a local optimum.

grfwd(), grback(), grcentral(), grnd() and grpracma()

These have be discussed above under Derivatives.

fnchk(), grchk() and hesschk()

These functions are provided to allow for detection of user errors in supplied function, gradient or hessian functions. Though we do not yet use hessians in the optimizers called, it is hoped that eventually they can be incorporated.

fnchk() is mainly a tool to ensure that the supplied function returns a finite scalar value when evaluated at the supplied parameters.

The other routines use numerical approximations (from numDeriv) to check the derivative functions supplied by the user.

bmchk()

This routine is intended to trap errors in setting up bounds and masks for function minimization problems. In particular, we are looking for situations where parameters are outside the bounds or where bounds are impossible to satisfy (e.g., lower > upper). This routine creates an indicator vector called bdmsk whose values are 1 for free parameters, 0 for masked (fixed) parameters, -3 for parameters at their lower bound and -1 for those at their upper bound. (The particular values are related to a coding trick for BASIC in the early 1980s that was used in @jnmws87.) Note that this function allows parameter values outside the bounds to be shifted to the nearest bound (when shift2bound=TRUE, which is the default behaviour).

scalechk()

This routine is an attempt to check if the parameters and bounds are roughly similar in their scale. Unequal scaling can result in poor outcomes when trying to optimize functions using derivative free methods that try to search the parameter space. Note that the attempt to include parameter scaling for all methods is intended to provide a work-around for such bad scaling. Because nonlinear functions have scaling that changes across the parameter domain, scalechk() should be treated with caution.

Program inefficiencies

One of the unfortunate, and only partially avoidable, inefficiencies in a wrapper function such as optimr() is that there will be duplication of much of the setup and error-avoidance that a properly constructed optimization program requires. That is, both the wrapper and the called programs will have code to accomplish similar goals. Some of these relate to the following.

Besides these sources of inefficiency, there is a potential cost in both human effort and program execution if we "specialize" variants of a code. For example, the separate unconstrained and constrained routines, with the wrapper calling the appropriate version. Rvmmin has a top-level routine to decide between Rvmminu and Rvmminb, but optimr() takes over this selection. A similar choice exists within dfoptim for the Hooke and Jeeves codes. While previously, I would have chosen to separate the bounded and unconstrained routines, I am now leaning towards a combined routine for the Hooke and Jeeves. First, I discovered that the separation seems to have introduced a bug, since the code was structured to allow a similar organization for both choices, where possibly a different structure would have been better adapted for efficient R. Note, however, that I have not performed appropriate timings to support this conjecture. Second, I managed to implement a bounds constrained HJ code from a description in one of my own books in less time than it took to try (not fully successfully) to correct the code from dfoptim, in part because the development and stable versions of the latter are quite different, though both failed the test function bt.f() example. Ravi Varadhan has corrected the codes on CRAN.

Note that this is not a criticism of the creators of dfoptim. I have made similar choices myself with other packages. It is challenging to balance clarity, maintainability, efficiency, and common structure for a suite of related program codes.

Providing controls to different algorithms {#ctrl}

We have already noted that it is important to provide control settings for the different methods. This can be a challenge, largely because some methods (or at least their instantiations in R codes or wrappers to other languages) only permit selected controls. The ctrldefault function provides as many of the controls as possible, and provides them via the control list. However, many methods put the contols into separate arguments of their function call. For example, the control$maxit iteration limit is iterlim for the function nlm().

A more subtle difficulty is that the multiple-method wrapper opm() will generally be called with the same iteration and function count limits across different solvers. However, we may wish to compare the underlying codes with their own, rather particular, limits. Again nlm() provides an example when we call the Wood 4-parameter test function starting at w0 <- c(-3,-1,-3,-1). The default iterlim value in nlm() is 100, but ctrldefault(npar=4) returns a value for maxit of 1000. This has consequences, as seen in the following example, which also shows how to call nlm through optimr() and opm() using the internal default.

  require(optimx)
  npar <- 4
  control<-list(maxit=NULL)
  ctrl <- ctrldefault(npar)
  ncontrol <- names(control)
  nctrl <- names(ctrl)
  for (onename in ncontrol) {
     if (onename %in% nctrl) {
       if (! is.null(control[onename]) || ! is.na(control[onename]) )
       ctrl[onename]<-control[onename]
     }
  }
  control <- ctrl # note the copy back! control now has a FULL set of values
  print(control$maxit)
  control<-list()
  ctrl <- ctrldefault(npar)
  ncontrol <- names(control)
  nctrl <- names(ctrl)
  for (onename in ncontrol) {
     if (onename %in% nctrl) {
       if (! is.null(control[onename]) || ! is.na(control[onename]) )
       ctrl[onename]<-control[onename]
     }
  }
  control <- ctrl # note the copy back! control now has a FULL set of values
  print(control$maxit)

Testing the package

There are examples and tests within the package. These include

At some future opportunity, I hope to be able to document these tests more fully.

Needed tests or examples to provide proper coverage

For optimr(), which runs individual solvers, we need to run the following test combinations for those methods where the tests are appropriate:

For axsearch(), an example of use. Currently a small example intests/bdstest.R`. ?? put example in Rd file.

For bmchk() ?? improve example in Rd file.

For bmstep() an example in the Rd file is needed. This is intended as an internal function that computes the maximum allowable step along a search direction (or returns Inf).

References



Try the optimx package in your browser

Any scripts or data that you put into this service are public.

optimx documentation built on Oct. 24, 2023, 5:06 p.m.