This rather long vignette is to explain the development and testing
of **nlsr**, an R package to try to bring
the **R** function nls() up to date and to add capabilities for the
extension of the symbolic and automatic derivative tools in **R**.

A particular goal in **nlsr** is to attempt, wherever possible, to
use analytic or automatic derivatives. The function nls() uses a
rather weak forward derivative approximation. A second objective
is to use a Marquardt stabilization of the Gauss-Newton equations
to avoid the commonly encountered "singular gradient" failure
of nls(). This refers to the loss of rank of the Jacobian at the
parameters for evaluation. The particular stabilization also
incorporates a very simple trick to avoid very small diagonal
elements of the Jacobian inner product, though in the present
implementations, this is accomplished indirectly.
See the section below **Implementation of method**

In preparing the **nlsr** package there is a sub-goal to unify,
or at least render compatible, various
packages in **R** for the estimation or analysis of problems amenable
to nonlinear least squares solution.

A large part of the work for this package -- particularly the parts concerning derivatives and R language structures -- was initially carried out by Duncan Murdoch. Without his input, much of the capability of the package would not exist.

The package and this vignette are a work in progress, and assistance and examples are welcome. Note that there is similar work in the package Deriv (Andrew Clausen and Serguei Sokol (2015) Symbolic Differentiation, version 2.0, https://cran.r-project.org/package=Deriv), and making the current work "play nicely" with that package is desirable.

- how to insert numerical derivatives when Deriv unable to get result
- approximations for jacfn beyond fwd approximation. How to specify??

`nlsr`

extracts and displays the coefficients for a model estimated by nlxb() or nlfb() in the nlsr structured object.

library(nlsr) ydat<-c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443, 38.558, 50.156, 62.948, 75.995, 91.972) tdat<-1:12 # try setting t and y here t <- tdat y <- ydat sol1 <- nlxb(y~100*b1/(1+10*b2*exp(-0.1*b3*t)), start=c(b1=2, b2=5, b3=3)) coef(sol1) print(coef(sol1))

Functions **Deriv** and **fnDeriv** are designed as replacements
for the stats package functions **D** and **deriv**
respectively, though the argument lists do not match exactly.
**newDeriv** allows additional analytic derivative expressions to
be added. The following is an expanded and commented version of the
examples from the manual for these functions.

require(nlsr) newDeriv() # a call with no arguments returns a list of available functions # for which derivatives are currently defined newDeriv(sin(x)) # a call with a function that is in the list of available derivatives # returns the derivative expression for that function nlsDeriv(~ sin(x+y), "x") # partial derivative of this function w.r.t. "x" ## CAUTION !! ## newDeriv(joe(x)) # but an undefined function returns NULL newDeriv(joe(x), deriv=log(x^2)) # We can define derivatives, even if joe(x) is meanin nlsDeriv(~ joe(x+z), "x") # Some examples of usage f <- function(x) x^2 newDeriv(f(x), 2*x*D(x)) nlsDeriv(~ f(abs(x)), "x") nlsDeriv(~ x^2, "x") nlsDeriv(~ (abs(x)^2), "x") # derivatives of distribution functions nlsDeriv(~ pnorm(x, sd=2, log = TRUE), "x") # get evaluation code from a formula codeDeriv(~ pnorm(x, sd = sd, log = TRUE), "x") # wrap it in a function call fnDeriv(~ pnorm(x, sd = sd, log = TRUE), "x") f <- fnDeriv(~ pnorm(x, sd = sd, log = TRUE), "x", args = alist(x =, sd = 2)) f f(1) 100*(f(1.01) - f(1)) # Should be close to the gradient # The attached gradient attribute (from f(1.01)) is # meaningless after the subtraction.

These functions create functions to evaluate residuals or sums of squares at particular parameter locations.

weed <- c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443, 38.558, 50.156, 62.948, 75.995, 91.972) ii <- 1:12 wdf <- data.frame(weed, ii) wmod <- weed ~ b1/(1+b2*exp(-b3*ii)) start1 <- c(b1=1, b2=1, b3=1) wrj <- model2rjfun(wmod, start1, data=wdf) print(wrj) weedux <- nlxb(wmod, start=c(b1=200, b2=50, b3=0.3)) print(weedux) wss <- model2ssgrfun(wmod, start1, data=wdf) print(wss) # We can get expressions used to calculate these as follows: wexpr.rj <- modelexpr(wrj) print(wexpr.rj) wexpr.ss <- modelexpr(wss) print(wexpr.ss) wrjdoc <- rjfundoc(wrj) print(wrjdoc) # We do not have similar function for ss functions

Given a nonlinear model expressed as an expression of the form of a function
that computes the residuals from the model and a start vector `par`

, tries
to minimize the nonlinear sum of squares of these residuals w.r.t. `par`

.
If `model(par, mydata)`

computes
an a vector of numbers that are presumed to be able to fit data `lhs`

, then
the residual vector is `(model(par,mydata) - lhs)`

,
though traditionally we write the negative of this vector. (Writing it this
way allows the derivatives of the residuals w.r.t. the parameters `par`

to be
the same as those for `model(par,mydata).)`

nlfb` tries to minimize the sum
of squares of the residuals with respect to the parameters.

The method takes a parameter `jacfn`

which returns the Jacobian matrix of
derivatives of the residuals w.r.t. the parameters in an attribute `gradient`

.
If this is NULL, then a numerical approximation to derivatives is used.
(??which -- and can we use the character method to define these?? What
about for nlxb??) Putting the Jacobian in the attribute `gradient`

allows
us to combine the computation of the residual and Jacobian in the same code
block if we wish, since there are generally common computations, though it
does make the setup slightly more complicated. See the example below.

The start vector preferably uses named parameters (especially if there is an underlying formula). The attempted minimization of the sum of squares uses the Nash variant (Nash, 1979) of the Marquardt algorithm, where the linear sub-problem is solved by a qr method.

shobbs.res <- function(x){ # scaled Hobbs weeds problem -- residual # This variant uses looping if(length(x) != 3) stop("hobbs.res -- parameter vector n!=3") y <- c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443, 38.558, 50.156, 62.948, 75.995, 91.972) tt <- 1:12 res <- 100.0*x[1]/(1+x[2]*10.*exp(-0.1*x[3]*tt)) - y } shobbs.jac <- function(x) { # scaled Hobbs weeds problem -- Jacobian jj <- matrix(0.0, 12, 3) tt <- 1:12 yy <- exp(-0.1*x[3]*tt) zz <- 100.0/(1+10.*x[2]*yy) jj[tt,1] <- zz jj[tt,2] <- -0.1*x[1]*zz*zz*yy jj[tt,3] <- 0.01*x[1]*zz*zz*yy*x[2]*tt attr(jj, "gradient") <- jj jj } cat("try nlfb\n") st <- c(b1=1, b2=1, b3=1) low <- -Inf up <- Inf ans1 <- nlfb(st, shobbs.res, shobbs.jac, trace=TRUE) summary(ans1) ## No jacobian function -- use internal approximation ans1n <- nlfb(st, shobbs.res, trace=TRUE, control=list(watch=FALSE)) # NO jacfn summary(ans1n) ## difference coef(ans1)-coef(ans1n)

Given a nonlinear model expressed as an expression of the form
lhs ~ formula_for_rhs
and a start vector where parameters used in the model formula are named,
attempts to find the minimum of the residual sum of squares using the
Nash variant (Nash, 1979) of the Marquardt algorithm, where the linear
sub-problem is solved by a qr method. The process is to develop the
residual and Jacobian functions using `model2rjfun`

, then call `nlfb`

.

# Data for Hobbs problem ydat <- c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443, 38.558, 50.156, 62.948, 75.995, 91.972) # for testing tdat <- seq_along(ydat) # for testing start1 <- c(b1=1, b2=1, b3=1) eunsc <- y ~ b1/(1+b2*exp(-b3*tt)) weeddata1 <- data.frame(y=ydat, tt=tdat) anlxb1 <- try(nlxb(eunsc, start=start1, trace=TRUE, data=weeddata1)) summary(anlxb1)

Print summary output (but involving some serious computations!) of an object of class nlsr from \code{nlxb} or \code{nlfb} from package \code{nlsr}.

### From examples above print(weedux) print(ans1)

For a nonlinear model originally expressed as an expression of the form lhs ~ formula_for_rhs assume we have a resfn and jacfn that compute the residuals and the Jacobian at a set of parameters. This routine computes the gradient, that is, t(Jacobian) %*% residuals.

## Use shobbs example RG <- resgr(st, shobbs.res, shobbs.jac) RG SS <- resss(st, shobbs.res) SS

Simplifications to render derivative expresssions more usable.

The related tools are: **newSimplification, sysSimplifications, isFALSE, isZERO, isONE, isMINUSONE, isCALL, findSubexprs, sysDerivs**.

**nlsSimplify** simplifies expressions according to rules specified
by **newSimplification**.

**findSubexprs** finds common subexpressions in an expression vector
so that duplicate computation can be avoided.

## nlsSimplify nlsSimplify(quote(a + 0)) nlsSimplify(quote(exp(1)), verbose = TRUE) nlsSimplify(quote(sqrt(a + b))) # standard rule ## sysSimplifications # creates a new environment whose parent is emptyenv() Why?? str(sysSimplifications) myrules <- new.env(parent = sysSimplifications) ## newSimplification newSimplification(sqrt(a), TRUE, a^0.5, simpEnv = myrules) nlsSimplify(quote(sqrt(a + b)), simpEnv = myrules) ## isFALSE print(isFALSE(1==2)) print(isFALSE(2==2)) ## isZERO print(isZERO(0)) x <- -0 print(isZERO(x)) x <- 0 print(isZERO(x)) print(isZERO(~(-1))) print(isZERO("-1")) print(isZERO(expression(-1))) ## isONE print(isONE(1)) x <- 1 print(isONE(x)) print(isONE(~(1))) print(isONE("1")) print(isONE(expression(1))) ## isMINUSONE print(isMINUSONE(-1)) x <- -1 print(isMINUSONE(x)) print(isMINUSONE(~(-1))) print(isMINUSONE("-1")) print(isMINUSONE(expression(-1))) ## isCALL ?? don't have good understanding of this x <- -1 print(isCALL(x,"isMINUSONE")) print(isCALL(x, quote(isMINUSONE))) ## findSubexprs findSubexprs(expression(x^2, x-y, y^2-x^2)) ## sysDerivs # creates a new environment whose parent is emptyenv() Why?? str(sysDerivs)

Provide a summary output (but involving some serious computations!) of an object of
class nlsr from nlxb or nlfb from package nlsr. Examples have been given above under
**nlxb** and **nlfb**.

Given a nonlinear model expressed as an expression of the form

lhs ~ formula_for_rhs

and a start vector where parameters used in the model formula are named, attempts to find the minimum of the residual sum of squares using the Nash variant (Nash, 1979) of the Marquardt algorithm, where the linear sub-problem is solved by a qr method.

# Data for Hobbs problem ydat <- c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443, 38.558, 50.156, 62.948, 75.995, 91.972) # for testing tdat <- seq_along(ydat) # for testing start1 <- c(b1=1, b2=1, b3=1) eunsc <- y ~ b1/(1+b2*exp(-b3*tt)) weeddata1 <- data.frame(y=ydat, tt=tdat) ## The following attempt with nls() fails! anls1 <- try(nls(eunsc, start=start1, trace=TRUE, data=weeddata1)) ## But we succeed by calling nlxb first. anlxb1 <- try(nlxb(eunsc, start=start1, trace=TRUE, data=weeddata1)) st2 <- coef(anlxb1) anls2 <- try(nls(eunsc, start=st2, trace=TRUE, data=weeddata1)) ## Or we can simply call wrapnlsr anls2a <- try(wrapnlsr(eunsc, start=start1, trace=TRUE, data=weeddata1))

Weighted regression alters the sum of squares of the parameters `prm`

from

sum_{i} (residual_i(prm)^2)

to

sum_{i} (wts_i * residual_i(prm)^2)

where `wts`

is the vector of weights. This is equivalent to multiplying each residual or
row of the Jacobian by the square root of the respective weight.

While `nls()`

has provision for (fixed) weights, the example given in the documentation
of `nls()`

for weighted regression actually uses a functional form. Moreover, the
weights are the square roots of the **predicted** or model values, so are not fixed.
Here we replace these weights with the fixed values of the quantity we are trying to
predict, namely the `rate`

variable in a kinetic model. In our example below we first
use the documentation example with predicted values; note that the name of the result
has been changed from that in the `nls()`

documentation. This result gives different estimates of the parameters from the fixed weights used afterwards.

Note that neither `nlsr::nlxb`

nor `nlsr::nlfb`

can use the weighted.MM function
directly. This is one of the more awkward aspects of nonlinear least squares and
nonlinear optimization.

Treated <- Puromycin[Puromycin$state == "treated", ] weighted.MM <- function(resp, conc, Vm, K) { ## Purpose: exactly as white book p. 451 -- RHS for nls() ## Weighted version of Michaelis-Menten model ## ---------------------------------------------------------- ## Arguments: 'y', 'x' and the two parameters (see book) ## ---------------------------------------------------------- ## Author: Martin Maechler, Date: 23 Mar 2001 pred <- (Vm * conc)/(K + conc) (resp - pred) / sqrt(pred) } ## Here is the estimation using predicted value weights Pur.wtnlspred <- nls( ~ weighted.MM(rate, conc, Vm, K), data = Treated, start = list(Vm = 200, K = 0.1)) summary(Pur.wtnlspred) ## nlxb cannot use this form Pur.wtnlxbpred <- try(nlxb( ~ weighted.MM(rate, conc, Vm, K), data = Treated, start = list(Vm = 200, K = 0.1))) ## and the structure is wrong for nlfb. See wres.MM below. Pur.wtnlfbpred <- try(nlfb(resfn=weighted.MM(rate, conc, Vm, K), data = Treated, start = list(Vm = 200, K = 0.1))) ## Now use fixed weights and the weights argument in nls() Pur.wnls <- nls( rate ~ (Vm * conc)/(K + conc), data = Treated, start = list(Vm = 200, K = 0.1), weights=1/rate) summary(Pur.wnls) ## nlsr::nlxb should give essentially the same result library(nlsr) ## Note that we put in the dataframe name to explicitly specify weights Pur.wnlxb <- nlxb( rate ~ (Vm * conc)/(K + conc), data = Treated, start = list(Vm = 200, K = 0.1), weights=1/Treated$rate) summary(Pur.wnlxb) ## check difference coef(Pur.wnls) - coef(Pur.wnlxb) ## Residuals Pur.wnls -- These are weighted ## resid(Pur.wnlxb) # ?? does not work -- why? print(res(Pur.wnlxb)) ## Those from nls() are NOT weighted resid(Pur.wnls) ## Try using a function, that is nlsr::nlfb stw <- c(Vm = 200, K = 0.1) wres.MM <- function(prm, rate, conc) { Vm <- prm[1] K <- prm[2] pred <- (Vm * conc)/(K + conc) (rate - pred) / sqrt(rate) # NOTE: NOT pred here } # test function first to see it works print(wres.MM(stw, rate=Treated$rate, conc=Treated$conc)) Pur.wnlfb <- nlfb(start=stw, resfn=wres.MM, rate = Treated$rate, conc=Treated$conc, trace=FALSE) # Note: weights are inside function already here summary(Pur.wnlfb) print(Pur.wnlfb) ## check estimates with nls() result coef(Pur.wnls) - coef(Pur.wnlfb) ## and with nlxb result coef(Pur.wnlxb) - coef(Pur.wnlfb) wres0.MM <- function(prm, rate, conc) { Vm <- prm[1] K <- prm[2] pred <- (Vm * conc)/(K + conc) (rate - pred) # NOTE: NO weights } Pur.wnlfb0 <- nlfb(start=stw, resfn=wres0.MM, rate = Treated$rate, conc=Treated$conc, weights=1/Treated$rate, trace=FALSE) # Note: explicit weights summary(Pur.wnlfb0) print(Pur.wnlfb0) ## check estimates with nls() result coef(Pur.wnls) - coef(Pur.wnlfb0) ## and with nlxb result coef(Pur.wnlxb) - coef(Pur.wnlfb0) wss.MM <- function(prm, rate, conc) { ss <- as.numeric(crossprod(wres.MM(prm, rate, conc))) } library(optimr) osol <- optimr(stw, fn=wss.MM, gr="grnd", method="Nelder-Mead", control=list(trace=0), rate=Treated$rate, conc=Treated$conc) print(osol) ## difference from nlxb osol$par - coef(Pur.wnlxb)

Multi-line functions present a challenge. This is in part because
the chain rule may have to be applied backwards (last line first), but
also because there may be structures that are not always differentiable,
such as *if* statements.

Note that the functional approach to nonlinear least squares -- accessible via
function `nlfb`

-- does let us prepare residuals and Jacobians of complexity
limited only by the capabilities of **R**.

This is the natural extension of Multi-line expressions. The authors would welcome collaboration with someone who has expertise in this area.

We would like to be able to find the Jacobian or gradient of functions
that have as their parameters a vector, e.g., *prm*. At time of writing
(January 2015) we cannot specify such a vector within **nlsr**.
?? is this true?

The following (rather long) script shows things that work or fail. In particular, it shows that the calls needed to get derivatives need to be set up precisely. This is not unique to R -- the same sort of attention to (syntax specific) detail is present in other systems like Macsyma/Maxima.

First we set up the Hobbs weed problem. Initially it seemed a good idea to put the data
WITHIN the residual function so that it would not have to be passed to the function.
In retrospect, this is a bad idea because y and t are needed. To make the use of the
data explicit, it is saved in the variables `tdat`

for time and `ydat`

for the weed
data.

# tryhobbsderiv.R ydat<-c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443, 38.558, 50.156, 62.948, 75.995, 91.972) tdat<-1:12 # try setting t and y here t <- tdat y <- ydat # now define a function hobbs.res<-function(x, t, y){ # Hobbs weeds problem -- residual # This variant uses looping # if(length(x) != 3) stop("hobbs.res -- parameter vector n!=3") # if(abs(12*x[3])>50) { # res<-rep(Inf,12) # } else { res<-x[1]/(1+x[2]*exp(-x[3]*t)) - y # } } # test it start1 <- c(200, 50, .3) ## residuals at start1 r1 <- hobbs.res(start1, t=tdat, y=ydat) print(r1)

Now we try some of the derivative capabilities of `nlsr`

.

## NOTE: some functions may be seemingly correct for R, but we do not ## get the result desired, despite no obvious error. Always test. require(nlsr) # Try directly to differentiate the residual vector. r1 is numeric, so this should # return a vector of zeros in a mathematical sense. In fact it gives an error, # since R does not want to differentiate a numeric vector. Jr1a <- fnDeriv(r1, "x") # Set up a function containing expression with subscripted parameters x[] hobbs1 <- function(x, t, y){ res<-x[1]/(1+x[2]*exp(-x[3]*t)) - y } # test the residuals print(hobbs1(start1, t=tdat, y=ydat)) # Alternatively, let us set up t and y y<-c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443, 38.558, 50.156, 62.948, 75.995, 91.972) t<-1:12 # try different calls. Note that we need to include t and y data somehow print(hobbs1(start1)) print(hobbs1(start1, t, y)) print(hobbs1(start1, tdat, ydat)) # We remove t and y, to ensure we don't get results from their presence rm(t) rm(y) # Set up a function containing an expression with named (with numbers) parameters # Note that we need to link these to the values in x hobbs1m <- function(x, t, y){ x001 <- x[1] x002 <- x[2] x003 <- x[3] res<-x001/(1+x002*exp(-x003*t)) - y } print(hobbs1m(x=start1, t=tdat, y=ydat)) # Function with explicit use of the expression() hobbs1me <- function(x, t, y){ x001 <- x[1] x002 <- x[2] x003 <- x[3] expression(x001/(1+x002*exp(-x003*t)) - y) } print(hobbs1me(start1, t=tdat, y=ydat)) # note failure (because the expression is not evaluated?) # # Now try to take derivatives Jr11m <- fnDeriv(hobbs1m, c("x001", "x002", "x003")) # fails because expression is INSIDE a function (i.e., closure) # # try directly differentiating the expression Jr11ex <-fnDeriv(expression(x001/(1+x002*exp(-x003*t)) - y) , c("x001", "x002", "x003")) # this seems to "work". Let us display the result Jr11ex # Set the values of the parameters by name x001 <- start1[1] x002 <- start1[2] x003 <- start1[3] # and try to evaluate print(eval(Jr11ex)) # we need t and y, so set them t<-tdat y<-ydat print(eval(Jr11ex)) # But there is still a problem. WHY??? # # Let us try it piece by piece (column by column) resx <- expression(x001/(1+x002*exp(-x003*t)) - y) res1 <- Deriv(resx, "x001", do_substitute=FALSE) res1 col1 <- eval(res1) res2 <- Deriv(resx, "x002", do_substitute=FALSE) res2 col2 <- eval(res2) res3 <- Deriv(resx, "x003", do_substitute=FALSE) res3 col3 <- eval(res3) hobJac <- cbind(col1, col2, col3) print(hobJac)

Clearly there is still some work to do to make the mechanism for getting derivatives more easily, and with less chance of error. Work still to do???.

Jacobians, the matrices of partial derivatives of residuals with respect
to the parameters, have a vector input (the parameters). `nlsr`

attempts
to generate the code for this computation. **This is the first key improvement
of nlsr over nls().** It will likely be a continuing effort to enlarge the
set of functions and expressions that can be handled and to deal with them
more efficiently. Also it would be nice to automate the generation of numerical
approximations for those components of the Jacobian for which analytic derivatives
are not available.

Note that the Jacobian inner product (J^T J) differs from the Hessian of the sum of squares by a matrix whose elements that are an inner product of the residuals with their second derivatives with respect to the parameters. This is a summation of m matrices each involving a residual times its (n by n) partial derivatives with respect to the parameters. Generally we treat this matrix as "ignorable" on the basis that the residuals should be "small", but clearly this is not always the case.

Nonlinear least squares methods are mostly founded on some or other variant of the Gauss-Newton algorithm. The function we wish to minimize is the sum of squares of the (nonlinear) residuals r(x) where there are m observations (elements of r) and n parameters x. Hence the function is

f(x) = sum(r(k)^2)

Newton's method starts with an original set of parameters x[0]. At a given iteraion, which could be the first, we want to solve

x[k+1] = x[k] - H^(-1) g

where H is the Hessian and g is the gradient at x[k]. We can rewrite this as a solution, at each iteration, of

H delta = -g

with

x[k+1] = x[k] + delta

For the particular sum of squares, the gradient is

g(x) = 2 * r(k)

and

H(x) = 2 ( J' J + sum(r * Z))

where J is the Jacobian (first derivatives of r w.r.t. x) and Z is the tensor of second derivatives of r w.r.t. x). Note that J' is the transpose of J.

The primary simplification of the Gauss-Newton method is to assume that the second term above is negligible. As there is a common factor of 2 on each side of the Newton iteration after the simplification of the Hessian, the Gauss-Newton iteration equation is

J' J delta = - J' r

This iteration frequently fails. The approximation of the Hessian by the Jacobian inner-product is one reason, but there is also the possibility that the sum of squares function is not "quadratic" enough that the unit step reduces it. @Hartley61 introduced a line search along delta, while @Marq63 suggested replacing J' J with (J' J + lambda * D) where D is a diagonal matrix intended to partially approximate the omitted portion of the Hessian.

Marquardt suggested D = I (a unit matrix) or D = (diagonal part of J' J). The former approach, when lambda is large enough that the iteration is essentially

delta = - g / lambda

we get a version of the steepest descents algorithm. Using the diagonal of J' J, we have a scaled version of this (see https://en.wikipedia.org/wiki/Levenberg%E2%80%93Marquardt_algorithm)

@jn77ima found that on low precision machines, it was common for diagonal elements of J' J to underflow. A very small modification to solve

(J' J + lambda * (D + phi * I)) * delta = - g

where phi is a small number (phi = 1 seems to work quite well) ?? check??.

In jncnm79, the iteration equation was solved as stated. However, this involves forming the sum of squares and cross products of J, a process that loses some numerical precision. A better way to solve the linear equations is to apply the QR decomposition to the matrix J itself. However, we still need to incorporate the lambda * I or lambda * D adjustments. This is done by adding rows to J that are the square roots of the "pieces". We add 1 row for each diagonal element of I and each diagonal element of D.

In each iteration, we reduce the lambda parameter before solution. If the
resulting sum of squares is not reduced, lambda is increased, otherwise we
move to the next iteration. Various authors (including the present one)
have suggested different strategies for this. My current opinion is that
a "quick" increase, say a factor of 10, and a "slow" decrease, say a factor
of 0.2, work quite well. However, it is important to check that lambda has
not got too small or underflowed before applying the increase factor. On
the other hand, it is useful to be able to set lambda = 0 in the code so
that a pure Gauss-Newton method can be evaluated with the program(s). The
current code `nlfb()`

uses the line

`if (lamda<1000*.Machine$double.eps) lamda<-1000*.Machine$double.eps`

to ensure we get an increase. To force a Gauss-Newton algorithm, the
controls `laminc`

and `lamdec`

are set to 0.

**The Levenberg-Marquardt adjustment to the Gauss-Newton approach is the
second major improvement of nlsr (and also its predecessor nlmrt and
the package minpack-lm) over nls().**

The success of many optimization codes often depends on knowing when no more progress can be made. For the present codes, one simple procedure increases the damping parameter lamda whenever the sum of squares cannot be decreased, with termination when the parameters are not altered by the addition of delta. While this "works", it does incur unnecessary computational effort.

A better approach is that of @BatesWatts81. Their relative offset criterion considers the potential progress that could be made in reducing the current residuals to the initial sum of squares. This is a sensible and very effective strategy for most situations, but is dangerous where the problem has very small or zero residuals, as in the case of an interpolation problem or nonlinear equations.

To illustrate this for zero residual problems, let us set up a simple test.

## test small resid case with roffset tt <- 1:25 ymod <- 10 * exp(-0.01*tt) + 5 n <- length(tt) evec0 <- rep(0, n) evec1 <- 1e-4*runif(n, -.5, .5) evec2 <- 1e-1*runif(n, -.5, .5) y0 <- ymod + evec0 y1 <- ymod + evec1 y2 <- ymod + evec2 mydata <- data.frame(tt, y0, y1, y2) st <- c(aa=1, bb=1, cc=1)

We provide three sizes of residuals here, but will only illustrate the consequences of
zero residuals (the problem with y0). Let us run the nonlinear least squares problem
to get the interpolating function, first with `nls()`

, then with `nlxb()`

. In the
second case we have turned off the trace, as the approach "works" because we have
taken care in the code to provide for problems with small residuals.

nlsfit0 <- try(nls(y0 ~ aa * exp(-bb*tt) + cc, start=st, data=mydata, trace=TRUE)) nlsfit0 library(nlsr) nlsrfit0 <- try(nlxb(y0 ~ aa * exp(-bb*tt) + cc, start=st, data=mydata, trace=FALSE)) nlsrfit0

We can approximate what `nls()`

is doing by running a simple Gauss-Newton method one
step at a time. It is (or was at the time of writing) easier to do this in functional
form with explicit derivatives. Note that we test these functions!

trf <- function(par, data) { tt <- data[,"tt"] res <- par["aa"]*exp(-par["bb"]*tt) + par["cc"] - y0 } print(trf(st, data=mydata)) trj <- function(par, data) { tt <- data[,"tt"] m <- dim(data)[1] JJ <- matrix(NA, nrow=m, ncol=3) JJ[,1] <- exp(-par["bb"]*tt) JJ[,2] <- - tt * par["aa"] * exp(-par["bb"]*tt) JJ[,3] <- 1 JJ } Ja <- trj(st, data=mydata) print(Ja) library(numDeriv) Jn <- jacobian(trf, st, data=mydata) print(Jn) print(max(abs(Jn-Ja))) ssf <- function(par, data){ rr <- trf(par, data) ss <- crossprod(rr) } print(ssf(st, data=mydata)) library(numDeriv) print(jacobian(trf, st, data=mydata))

The traditional way to implement a Gauss Newton method was to form the sum of squares and cross-products matrix at in traditional linear regression, using the Jacobian as the data matrix. The right hand side uses the inner product of the Jacobian with the negative residuals. This approach increases the condition number of the problem, and a more direct QR decomposition of the Jacobian is now the preferred approach. However, let us try both to ensure we are getting similar answers. First the QR approach.

gnjn <- function(start, resfn, jacfn = NULL, trace = FALSE, data=NULL, control=list(), ...){ # simplified Gauss Newton offset = 1e6 # for no change in parms stepred <- 0.5 # start with this as per nls() par <- start cat("starting parameters:") print(par) res <- resfn(par, data, ...) ssbest <- as.numeric(crossprod(res)) cat("initial ss=",ssbest,"\n") par0 <- par kres <- 1 kjac <- 0 keepon <- TRUE while (keepon) { cat("kjac=",kjac," kres=",kres," SSbest now ", ssbest,"\n") print(par) JJ <- jacfn(par, data, ...) kjac <- kjac + 1 QJ <- qr(JJ) delta <- qr.coef(QJ, -res) ss <- ssbest + offset*offset # force evaluation step <- 1.0 if (as.numeric(max(par0+delta)+offset) != as.numeric(max(par0+offset)) ) { while (ss > ssbest) { par <- par0+delta * step res <- resfn(par, data, ...) ss <- as.numeric(crossprod(res)) kres <- kres + 1 ## cat("step =", step," ss=",ss,"\n") ## tmp <- readline("continue") if (ss > ssbest) { step <- step * stepred } else { par0 <- par ssbest <- ss } } # end inner loop if (kjac >= 4) { keepon = FALSE cat("artificial stop at kjac=4 -- we only want to check output") } } else { keepon <- FALSE # done } } # end main iteration } # seems to need this } # end gnjne fitgnjn0 <- gnjn(st, trf, trj, data=mydata) ## Another way #- set lamda = 0 in nlxb, fix laminc, lamdec library(nlsr) nlx00 <- try(nlxb(y0 ~ aa * exp(-bb*tt) + cc, start=st, data=mydata, trace=TRUE, control=list(lamda=0, laminc=0, lamdec=0, watch=TRUE))) nlx00

The first iteration more or less matches the `nls()`

result. The problem is quite
ill-conditioned, and `nls()`

is using a numerical approximation to the Jacobian,
so deviations of this magnitude from the iterations are not unexpected.

Let us test with a traditional Gauss-Newton.

gnjn2 <- function(start, resfn, jacfn = NULL, trace = FALSE, data=NULL, control=list(), ...){ # simplified Gauss Newton offset = 1e6 # for no change in parms stepred <- 0.5 # start with this as per nls() par <- start cat("starting parameters:") print(par) res <- resfn(par, data, ...) ssbest <- as.numeric(crossprod(res)) cat("initial ss=",ssbest,"\n") kres <- 1 kjac <- 0 par0 <- par keepon <- TRUE while (keepon) { cat("kres=",kres," kjac=",kjac," SSbest now ", ssbest,"\n") print(par) JJ <- jacfn(par, data, ...) kjac <- kjac + 1 JTJ <- crossprod (JJ) JTr <- crossprod (JJ, res) delta <- - as.vector(solve(JTJ, JTr)) ss <- ssbest + offset*offset # force evaluation step <- 1.0 if (as.numeric(max(par0+delta)+offset) != as.numeric(max(par0+offset)) ) { while (ss > ssbest) { par <- par0+delta * step res <- resfn(par, data, ...) ss <- as.numeric(crossprod(res)) kres <- kres + 1 ## cat("step =", step," ss=",ss," best is",ssbest,"\n") ## tmp <- readline("continue") if (ss > ssbest) { step <- step * stepred } else { par0 <- par ssbest <- ss } } # end inner loop if (kjac >= 4) { keepon = FALSE cat("artificial stop at kjac=4 -- we only want to check output") } } else { keepon <- FALSE # done } } # end main iteration } # seems to need this } # end gnjn2 fitgnjn20 <- gnjn2(st, trf, trj, data=mydata)

Solution of sets of nonlinear equations is generally NOT a problem that is commonly required for statisticians or data analysts. My experience is that the occasions where it does arise are when workers try to solve the first order conditions for optimality of a function, rather than try to optimize the function. If this function is a sum of squares, then we have a nonlinear least squares problem, and generally such problems are best approached my methods of the type discussed in this article.

Conversely, since our problem is, using the notation above, equivalent to

r(x) = 0

the solution of a nonlinear least squares problem for which the final sum of squares is zero provides a solution to the nonlinear equations. In my experience this is a valid approach to the nonlinear equations problem, especially if there is concern that a solution may not exist. Note that there are methods for nonlinear equations, some of which (??ref) are available in R packages.

In `nlsr`

, we provide for modelling via R functions that generate
the residuals and Jacobian, and we call the nonlinear least squares
minimizer through `nlfb`

. Alternatively, we can do the modelling
via the `nlxb`

function that
takes a model expression as an argument and converts it to the
functional form. The actual least squares minimization is then
carried out by a common routine (`nlfb`

). Generally `nlxb`

is a
lot less programming work, but it is also more limited, as we can
only handle single line expressions, and some expressions may not
be differentiable within the package, even if there are mathematically
feasible ways to compute them analytically.

Almost all statistical functions have exogenous data that is needed to compute residuals or likelihoods and is not dependent on the model parameters. (This section starts from Notes140806.)

model2rjfun does NOT have ... args.

Should it have? i.e., a problem where we are fitting a set of time series, 1 for each plant/animal, with some sort of start parameter for each that is NOT estimated (e.g., pH of soil, some index of health).

Difficulty in such a problem is that the residuals are then a matrix, and the nlfb rather than nlxb is a better approach. However, fitting 1 series would still need this data, and example nlsrtryextraparms.txt shows that the extra parm (ms in this case) needs to be in the user's globalenv.

rm(list=ls()) require(nlsr) # want to have data AND extra parameters (NOT to be estimated) traceval <- TRUE # traceval set TRUE to debug or give full history # Data for Hobbs problem ydat <- c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443, 38.558, 50.156, 62.948, 75.995, 91.972) # for testing tdat <- seq_along(ydat) # for testing # A simple starting vector -- must have named parameters for nlxb, nls, wrapnlsr. start1 <- c(b1=1, b2=1, b3=1) startf1 <- c(b1=1, b2=1, b3=.1) eunsc <- y ~ b1/(1+b2*exp(-b3*tt)) cat("LOCAL DATA IN DATA FRAMES\n") weeddata1 <- data.frame(y=ydat, tt=tdat) cat("weeddata contains the original data\n") ms <- 2 # define the external parameter here cat("wdata scales y by ms =",ms,"\n") wdata <- data.frame(y=ydat/ms, tt=tdat) wdata cat("estimate the UNSCALED model with SCALED data\n") anlxbs <- try(nlxb(eunsc, start=start1, trace=traceval, data=wdata)) print(anlxbs) escal <- y ~ ms*b1/(1+b2*exp(-b3*tt)) cat("estimate the SCALED model with scaling provided in the call (ms=0.5)\n") anlxbh <- try(nlxb(escal, start=start1, trace=traceval, data=weeddata1, ms=0.5)) print(anlxbh) cat("\n scaling is now using the globally defined value of ms=",ms,"\n") anlxb1a <- try(nlxb(escal, start=start1, trace=traceval, data=weeddata1)) print(anlxb1a) ms <- 1 cat("\n scaling is now using the globally re-defined value of ms=",ms,"\n") anlxb1b <- try(nlxb(escal, start=start1, trace=traceval, data=weeddata1)) print(anlxb1b)

We use the Hobbs Weeds problem (Nash, 1979 and Nash, 2014). Note that nls() fails from start1.

require(nlsr) traceval <- FALSE # Data for Hobbs problem ydat <- c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443, 38.558, 50.156, 62.948, 75.995, 91.972) # for testing tdat <- seq_along(ydat) # for testing # A simple starting vector -- must have named parameters for nlxb, nls, wrapnlsr. start1 <- c(b1=1, b2=1, b3=1) eunsc <- y ~ b1/(1+b2*exp(-b3*tt)) cat("LOCAL DATA IN DATA FRAMES\n") weeddata1 <- data.frame(y=ydat, tt=tdat) weeddata2 <- data.frame(y=1.5*ydat, tt=tdat) anlxb1 <- try(nlxb(eunsc, start=start1, trace=TRUE, data=weeddata1, control=list(watch=FALSE))) print(anlxb1) anlsb1 <- try(nls(eunsc, start=start1, trace=TRUE, data=weeddata1)) print(anlsb1)

A different start causes nlxb to return a large sum of squares. Note that nls() again fails.

startf1 <- c(b1=1, b2=1, b3=.1) anlsf1 <- try(nls(eunsc, start=startf1, trace=TRUE, data=weeddata1)) print(anlsf1) library(nlsr) anlxf1 <- try(nlxb(eunsc, start=startf1, trace=TRUE, data=weeddata1, control=list(watch=FALSE))) print(anlxf1) # anlxb2 <- try(nlxb(eunsc, start=start1, trace=FALSE, data=weeddata2)) # print(anlxb2)

We can discover quickly the difficulty here by computing the Jacobian at this "solution" and checking its singular values.

cf1 <- coef(anlxf1) print(cf1) jf1 <- anlxf1$jacobian svals <- svd(jf1)$d print(svals)

Here we see that the Jacobian is only rank 1, even though there are 3 coefficients. It is therefore not surprising that our nonlinear least squares program has concluded we are unable to make further progress.

We can run the same example as above using **R** functions rather than
expressions, but now we need to have a gradient function as well as one to
compute residuals. **nlsr** has tools to create these functions from expressions,
as we shall see here. First we again set up the data and load the package.

require(nlsr) traceval <- FALSE # Data for Hobbs problem ydat <- c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443, 38.558, 50.156, 62.948, 75.995, 91.972) # for testing tdat <- seq_along(ydat) # for testing # A simple starting vector -- must have named parameters for nlxb, nls, wrapnlsr. start1 <- c(b1=1, b2=1, b3=1) eunsc <- y ~ b1/(1+b2*exp(-b3*tt)) cat("LOCAL DATA IN DATA FRAMES\n") weeddata1 <- data.frame(y=ydat, tt=tdat)

weedrj <- model2rjfun(modelformula=eunsc, pvec=start1, data=weeddata1) weedrj rjfundoc(weedrj) # Note how useful this is to report status

y <- c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443, 38.558, 50.156, 62.948, 75.995, 91.972) tt <- seq_along(y) # for testing mydata <- data.frame(y = y, tt = tt) f <- y ~ 100*b1/(1 + 10*b2 * exp(-0.1 * b3 * tt)) p <- c(b1 = 1, b2 = 1, b3 = 1) rjfn <- model2rjfun(f, p, data = mydata) rjfn(p) myfn <- function(p, resfn=rjfn){ val <- resss(p, resfn) } p <- c(b1 = 2, b2=2, b3=1) a1 <- optim(p, myfn, control=list(trace=0)) a1

We can also embed the function directly.

a2 <- optim(p, function(p,resfn=rjfn){resss(p,resfn)}, control=list(trace=0)) a2

?? from a different vignette, need to integrate

One of the common needs for optimization computations is the availability of accurate gradients of the objective functions. While differentiation is relatively straightforward, it is tedious and error-prone.

At 2015-1-24, I have not determined how to automate the use the output
of the derivatives generated by **nlsr** to create a working gradient
function. However, the following write-up shows how such a function can
be generated in a semi-automatic way.

We use an example that appeared on the R-help mailing list on Jan 14, 2015. Responses by Ravi Varadhan and others, along with some modification I made, gave the following negative log likelihood function to be minimized.

require(nlsr) y=c(5,11,21,31,46,75,98,122,145,165,195,224,245,293,321,330,350,420) # data set Nweibull2 <- function(x,prm){ la <- prm[1] al <- prm[2] be<- prm[3] val2 <- la*be*(x/al)^(be-1)* exp( (x/al)^be+la*al*(1-exp((x/al)^be) ) ) val2 } LL2J <- function(par,y) { R = Nweibull2(y,par) -sum(log(R)) }

We want the gradient of LL3J() with respect to *par*, and first compute the
derivatives of Nweibull2() with respect to the paramters *prm*

We start with the central expression in Nweibull2() and compute its partial derivatives. The expression is:

la*be*(x/al)^(be-1)* exp( (x/al)^be+la*al*(1-exp((x/al)^be) ) )

# Put in the main expression for the Nweibull pdf. ## we generate the three gradient components g1n <- nlsDeriv(~ la*be*(x/al)^(be-1)* exp( (x/al)^be+la*al*(1-exp((x/al)^be) ) ), "la") g1n g2n <- nlsDeriv(~ la*be*(x/al)^(be-1)* exp( (x/al)^be+la*al*(1-exp((x/al)^be) ) ), "al") g2n g3n <- nlsDeriv(~ la*be*(x/al)^(be-1)* exp( (x/al)^be+la*al*(1-exp((x/al)^be) ) ), "be") g3n

By copying and pasting the output above into a function structure, we get Nwei2g() below.

Nwei2g <- function(x, prm){ la <- prm[1] al <- prm[2] be<- prm[3] g1v <- la * be * (x/al)^(be - 1) * (exp((x/al)^be + la * al * (1 - exp((x/al)^be))) * (al * (1 - exp((x/al)^be)))) + be * (x/al)^(be - 1) * exp((x/al)^be + la * al * (1 - exp((x/al)^be))) g2v <- la * be * (x/al)^(be - 1) * (exp((x/al)^be + la * al * (1 - exp((x/al)^be))) * (be * (x/al)^(be - 1) * -(x/al^2) + (la * al * -(exp((x/al)^be) * (be * (x/al)^(be - 1) * -(x/al^2))) + la * (1 - exp((x/al)^be))))) + la * be * ((be - 1) * (x/al)^(be - 1 - 1) * -(x/al^2)) * exp((x/al)^be + la * al * (1 - exp((x/al)^be))) g3v <- la * be * (x/al)^(be - 1) * (exp((x/al)^be + la * al * (1 - exp((x/al)^be))) * ((x/al)^be * log(x/al) + la * al * -(exp((x/al)^be) * ((x/al)^be * log(x/al))))) + (la * be * ((x/al)^(be - 1) * log(x/al)) + la * (x/al)^(be - 1)) * exp((x/al)^be + la * al * (1 - exp((x/al)^be))) gg <- matrix(data=c(g1v, g2v, g3v), ncol=3) }

We can check this gradient functionusing the grad() function from package
**numDeriv**.

start1 <- c(lambda=.01,alpha=340,beta=.8) start2 <- c(lambda=.01,alpha=340,beta=.7) require(numDeriv) ganwei <- Nwei2g(y, prm=start1) require(numDeriv) Nw <- function(x, y) { Nweibull2(y, x) } # to allow grad() to work gnnwei <- matrix(NA, nrow=length(y), ncol=3) for (i in 1:length(y)){ gnrow <- grad(Nw, x=start1, y=y[i]) gnnwei[i,] <- gnrow } gnnwei ganwei cat("max(abs(gnnwei - ganwei))= ", max(abs(gnnwei - ganwei)),"\n")

Now we can build the gradient of the objective function. This requires
an application of the chain rule to the summation of logs of the
elements of the quantity *R*. Since the derivative of log(R) w.r.t. R is
simply 1/R, this is relatively simple. However, I have not found how
to automate this.

## and now we can build the gradient of LL2J LL2Jg <- function(prm, y) { R = Nweibull2(y,prm) gNN <- Nwei2g(y, prm) # print(str(gNN) gL <- - as.vector( t(1/R) %*% gNN) } # test gaLL2J <- LL2Jg(start1, y) gaLL2J gnLL2J <- grad(LL2J, start1, y=y) gnLL2J cat("max(abs(gaLL2J-gnLL2J))= ", max(abs(gaLL2J-gnLL2J)), "\n" )

These examples show dotargs do NOT work for any of nlsr, nls, or minpack.lm. Use of a dataframe or local (calling) environment objects does work in all.

library(knitr) # read chunk (does not run code) # read_chunk('/home/john/rsvnall/optimizer/pkg/nlsr/inst/dev-files/nlsdata.R') read_chunk(system.file('dev-files/nlsdata.R', package = 'nlsr')) # We use this approach to avoid having differences between file here and in inst/extdata/

## @knitr nlsdata.R # try different ways of supplying data to R nls stuff ydata <- c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443, 38.558, 50.156, 62.948, 75.995, 91.972) ttdata <- seq_along(ydata) # for testing mydata <- data.frame(y = ydata, tt = ttdata) hobsc <- y ~ 100*b1/(1 + 10*b2 * exp(-0.1 * b3 * tt)) ste <- c(b1 = 2, b2 = 5, b3 = 3) # let's try finding the variables findmainenv <- function(formula, prm) { vn <- all.vars(formula) pnames <- names(prm) ppos <- match(pnames, vn) datvar <- vn[-ppos] cat("Data variables:") print(datvar) cat("Are the variables present in the current working environment?\n") for (i in seq_along(datvar)){ cat(datvar[[i]]," : present=",exists(datvar[[i]]),"\n") } } findmainenv(hobsc, ste) y <- ydata tt <- ttdata findmainenv(hobsc, ste) rm(y) rm(tt) # =============================== # let's try finding the variables in dotargs finddotargs <- function(formula, prm, ...) { dots <- list(...) cat("dots:") print(dots) cat("names in dots:") dtn <- names(dots) print(dtn) vn <- all.vars(formula) pnames <- names(prm) ppos <- match(pnames, vn) datvar <- vn[-ppos] cat("Data variables:") print(datvar) cat("Are the variables present in the dot args?\n") for (i in seq_along(datvar)){ dname <- datvar[[i]] cat(dname," : present=",(dname %in% dtn),"\n") } } finddotargs(hobsc, ste, y=ydata, tt=ttdata) # =============================== y <- ydata tt <- ttdata tryq <- try(nlsquiet <- nls(formula=hobsc, start=ste)) if (class(tryq) != "try-error") {print(nlsquiet)} else {cat("try-error\n")} #- OK rm(y) rm(tt) tdots<-try(nlsdots <- nls(formula=hobsc, start=ste, y=ydata, tt=ttdata)) if ( class(tdots) != "try-error") {print(nlsdots)} else {cat("try-error\n")} #- Fails tframe <- try(nlsframe <- nls(formula=hobsc, start=ste, data=mydata)) if (class(tframe) != "try-error") {print(nlsframe)} else {cat("try-error\n")} #- OK library(nlsr) y <- ydata tt <- ttdata tquiet <- try(nlsrquiet <- nlxb(formula=hobsc, start=ste)) if ( class(tquiet) != "try-error") {print(nlsrquiet)} #- OK rm(y) rm(tt) test <- try(nlsrdots <- nlxb(formula=hobsc, start=ste, y=ydata, tt=ttdata)) if (class(test) != "try-error") { print(nlsrdots) } else {cat("Try error\n") } #- Note -- does NOT work -- do we need to specify the present env. in nlfb for y, tt?? test2 <- try(nlsframe <- nls(formula=hobsc, start=ste, data=mydata)) if (class(test) != "try-error") {print(nlsframe) } else {cat("Try error\n") } #- OK library(minpack.lm) y <- ydata tt <- ttdata nlsLMquiet <- nlsLM(formula=hobsc, start=ste) print(nlsLMquiet) #- OK rm(y) rm(tt) ## Dotargs tdots <- try(nlsLMdots <- nlsLM(formula=hobsc, start=ste, y=ydata, tt=ttdata)) if (class(tdots) != "try-error") { print(nlsLMdots) } else {cat("try-error\n") } #- Note -- does NOT work ## dataframe tframe <- try(nlsLMframe <- nlsLM(formula=hobsc, start=ste, data=mydata) ) if (class(tdots) != "try-error") {print(nlsLMframe)} else {cat("try-error\n") } #- does not work ## detach("package:nlsr", unload=TRUE) ## Uses nlmrt here for comparison ## library(nlmrt) ## txq <- try( nlxbquiet <- nlxb(formula=hobsc, start=ste)) ## if (class(txq) != "try-error") {print(nlxbquiet)} else { cat("try-error\n")} #- Note -- does NOT work ## txdots <- try( nlxbdots <- nlxb(formula=hobsc, start=ste, y=y, tt=tt) ) ## if (class(txdots) != "try-error") {print(nlxbdots)} else {cat("try-error\n")} #- Note -- does NOT work ## dataframe ## nlxbframe <- nlxb(formula=hobsc, start=ste, data=mydata) ## print(nlxbframe) #- OK

**Any scripts or data that you put into this service are public.**

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.