propagate: Propagation of Uncertainty

Description Usage Arguments Details Value Author(s) References Examples

A function for calculating confidence/prediction intervals of (weighted) nonlinear models for the supplied or new predictor values, by using first-/second-order Taylor expansion and Monte Carlo simulation. This approach can be used to construct more realistic error estimates and confidence/prediction intervals for nonlinear models than what is possible with only a simple linearization (first-order Taylor expansion) approach. Another application is when there is an "error in x" setup with uncertainties in the predictor variable (See 'Examples'). This function will also work in the presence of multiple predictors with/without errors.

1 2	predictNLS(model, newdata, newerror, interval = c("confidence", "prediction", "none"), alpha = 0.05, ...)

`model`	a model obtained from `nls` or `nlsLM` (package 'minpack.lm').
`newdata`	a data frame with new predictor values, having the same column names as in `model`. See `predict.nls` and 'Examples'. If omitted, the model's predictor values are employed.
`newerror`	a data frame with optional error values, having the same column names as in `model` and in the same order as in `newdata`. See 'Examples'.
`interval`	A character string indicating if confidence/prediction intervals are to be calculated or not.
`alpha`	the α level.
`...`	other parameters to be supplied to `propagate`.

Calculation of the propagated uncertainty σ_y using \nabla Σ \nabla^T is called the "Delta Method" and is widely applied in NLS fitting. However, this method is based on first-order Taylor expansion and thus assummes linearity around \hat{y} = f(x, \hat{β_i}). The second-order approach as implemented in the propagate function can partially correct for this restriction by using a second-order polynomial around \hat{y}.
Confidence and prediction intervals are calculated in a usual way using t(1 - \frac{α}{2}, ν) \cdot σ_y (1) or t(1 - \frac{α}{2}, ν) \cdot √{σ_y^2 + \textcolor{red}{σ_r^2}} (2), respectively, where the residual variance \textcolor{red}{σ_r^2} = \frac{∑_{i=1}^n (y_i - \hat{y}_i)^2}{ν} (3). The inclusion of \textcolor{red}{σ_r^2} in the prediction interval is implemented as an extended gradient and "augmented" covariance matrix. So instead of using \hat{y} = f(x, \hat{β}) (4), we take \hat{y} = f(x, \hat{β}) + \textcolor{red}{σ_r^2} (5) as the expression and augment the i \times i covariance matrix Σ to an (i+1) \times (i+1) covariance matrix, where Σ_{i+1, i+1} = \textcolor{red}{σ_r^2}. Partial differentiation and matrix multiplication will then yield, for example with two coefficients β_1 and β_2 and their corresponding covariance matrix Σ:

≤ft[\frac{\partial f}{\partial β_1}\; \frac{\partial f}{\partial β_2}\; \textcolor{red}{1}\right] ≤ft[ \begin{array}{ccc} σ_1^2 & σ_1σ_2 & 0 \\ σ_2σ_1 & σ_2^2 & 0 \\ 0 & 0 & \textcolor{red}{σ_r^2} \end{array} \right] ≤ft[ \begin{array}{c} \frac{\partial f}{\partial β_1} \\ \frac{\partial f}{\partial β_2} \\ \textcolor{red}{1} \end{array} \right] \;\; (6)

= ≤ft(\frac{\partial f}{\partial β_1}\right)^2σ_1^2 + 2 \frac{\partial f}{\partial β_1} \frac{\partial f}{\partial β_2} σ_1 σ_2 + ≤ft(\frac{\partial f}{\partial β_2}\right)^2σ_2^2 + \textcolor{red}{σ_r^2} \;\; (7)

\equiv σ_y^2 + \textcolor{red}{σ_r^2}, which then goes into (2).
The advantage of the augmented covariance matrix is that it can be exploited for creating Monte Carlo simulation-based prediction intervals. This is new from version 1.0-6 and is based on the paradigm that we simply add another dimension by employing the augmented covariance matrix of (6) in the multivariate t-distribution random number generator (in our case rtmvt), with μ = 0. All n simulations are then evaluated with (5) and the usual [1 - \frac{α}{2}, \frac{α}{2}] quantiles calculated.

If errors are supplied to the predictor values in newerror, they need to have the same column names and order than the new predictor values.

A list with the following items:
summary: The mean/error estimates obtained from first-/second-order Taylor expansion and Monte Carlo simulation, together with calculated confidence/prediction intervals based on asymptotic normality.
prop: the complete output from propagate for each value in newdata.

Andrej-Nikolai Spiess

Nonlinear Regression.
Seber GAF & Wild CJ.
John Wiley & Sons; 1ed, 2003.

Nonlinear Regression Analysis and its Applications.
Bates DM & Watts DG.
Wiley-Interscience; 1ed, 2007.

Statistical Error Propagation.
Tellinghuisen J.
J. Phys. Chem. A (2001), 47: 3917-3921.

Least-squares analysis of data with uncertainty in x and y: A Monte Carlo methods comparison.
Tellinghuisen J.
Chemometr Intell Lab (2010), 47: 160-169.

From the author's blog:
http://rmazing.wordpress.com/2013/08/14/predictnls-part-1-monte-carlo-simulation-confidence-intervals-for-nls-models/
http://rmazing.wordpress.com/2013/08/26/predictnls-part-2-taylor-approximation-confidence-intervals-for-nls-models/
https://rmazing.wordpress.com/2018/05/11/monte-carlo-based-prediction-intervals-for-nonlinear-regression/

## In these examples, 'nsim = 100000' to save
## Rcmd check time (CRAN). It is advocated
## to use at least 'nsim = 1000000' though...

## Example from ?nls.
DNase1 <- subset(DNase, Run == 1)
fm3DNase1 <- nls(density ~ Asym/(1 + exp((xmid - log(conc))/scal)),
                 data = DNase1, start = list(Asym = 3, xmid = 0, scal = 1))

## Using a single predictor value without error.
PROP1 <- predictNLS(fm3DNase1, newdata = data.frame(conc = 2), nsim = 100000)
PRED1 <- predict(fm3DNase1, newdata = data.frame(conc = 2))
PROP1$summary
PRED1
## => Prop.Mean.1 equal to PRED1

## Using a single predictor value with error.
PROP2 <- predictNLS(fm3DNase1, newdata = data.frame(conc = 2), 
                    newerror = data.frame(conc = 0.5), nsim = 100000)
PROP2$summary

## Not run: 
## Using a sequence of predictor values without error.
CONC <- seq(1, 12, by = 1)
PROP3 <- predictNLS(fm3DNase1, newdata = data.frame(conc = CONC))
PRED3 <- predict(fm3DNase1, newdata = data.frame(conc = CONC))
PROP3$summary
PRED3
## => Prop.Mean.1 equal to PRED3

## Plot mean and confidence values from first-/second-order 
## Taylor expansion and Monte Carlo simulation.
plot(DNase1$conc, DNase1$density)
lines(DNase1$conc, fitted(fm3DNase1), lwd = 2, col = 1)
points(CONC, PROP3$summary[, 1], col = 2, pch = 16)
lines(CONC, PROP3$summary[, 5], col = 2)
lines(CONC, PROP3$summary[, 6], col = 2)
lines(CONC, PROP3$summary[, 11], col = 4)
lines(CONC, PROP3$summary[, 12], col = 4)

## Using a sequence of predictor values with error.
PROP4 <- predictNLS(fm3DNase1, newdata = data.frame(conc = 1:5), 
                    newerror = data.frame(conc = (1:5)/10))
PROP4$summary

## Using multiple predictor values.
## 1: Setup of response values with gaussian error of 10%.
x <- seq(1, 10, by = 0.01)
y <- seq(10, 1, by = -0.01)
a <- 2
b <- 5
c <- 10
z <- a * exp(b * x)^sin(y/c)
z <- z + sapply(z, function(x) rnorm(1, x, 0.10 * x))

## 2: Fit 'nls' model.
MOD <- nls(z ~ a * exp(b * x)^sin(y/c), 
           start = list(a = 2, b = 5, c = 10))

## 3: Single newdata without errors.
DAT1 <- data.frame(x = 4, y = 3)
PROP5 <- predictNLS(MOD, newdata = DAT1)
PROP5$summary

## 4: Single newdata with errors.
DAT2 <- data.frame(x = 4, y = 3)
ERR2 <- data.frame(x = 0.2, y = 0.1)
PROP6 <- predictNLS(MOD, newdata = DAT2, newerror = ERR2)
PROP6$summary

## 5: Multiple newdata with errors.
DAT3 <- data.frame(x = 1:4, y = 3)
ERR3 <- data.frame(x = rep(0.2, 4), y = seq(1:4)/10)
PROP7 <- predictNLS(MOD, newdata = DAT3, newerror = ERR3)
PROP7$summary

## 6: Linear model to compare conf/pred intervals.
set.seed(123)
X <- 1:20
Y <- 3 + 2 * X + rnorm(20, 0, 2)
plot(X, Y)
LM <- lm(Y ~ X)
NLS <- nlsLM(Y ~ a + b * X, start = list(a = 3, b = 2)) 
predict(LM, newdata = data.frame(X = 14.5), interval = "conf") 
predictNLS(NLS, newdata = data.frame(X = 14.5), interval = "conf")$summary
predict(LM, newdata = data.frame(X = 14.5), interval = "pred")
predictNLS(NLS, newdata = data.frame(X = 14.5), interval = "pred")$summary

## 7: compare to 'predFit' function of 'investr' package.
## Same results when using only first-order Taylor expansion.
require(investr)
data(Puromycin, package = "datasets")
Puromycin2 <- Puromycin[Puromycin$state == "treated", ][, 1:2]
Puro.nls <- nls(rate ~ Vm * conc/(K + conc), data = Puromycin2,
                start = c(Vm = 200, K = 0.05))
PRED1 <- predFit(Puro.nls, interval = "prediction")
PRED2 <- predictNLS(Puro.nls, interval = "prediction", second.order = FALSE, do.sim = FALSE)
all.equal(PRED1[, "lwr"], PRED2$summary[, 5]) # => TRUE

## End(Not run)