propagate: Propagation of uncertainty using higher-order Taylor...
In anspiess/propagate: Propagation of Uncertainty

Description Usage Arguments Details Value Author(s) References Examples

A general function for the calculation of uncertainty propagation by first-/second-order Taylor expansion and Monte Carlo simulation including covariances. Input data can be any symbolic/numeric differentiable expression and data based on summaries (mean & s.d.) or sampled from distributions. Uncertainty propagation is based completely on matrix calculus accounting for full covariance structure. Monte Carlo simulation is conducted using a multivariate t-distribution with covariance structure. Propagation confidence intervals are calculated from the expanded uncertainties by means of the degrees of freedom obtained from WelchSatter, or from the [\frac{α}{2}, 1-\frac{α}{2}] quantiles of the MC evaluations.

1 2	propagate(expr, data, second.order = TRUE, do.sim = TRUE, cov = TRUE, df = NULL, nsim = 1000000, alpha = 0.05, ...)

`expr`	an expression, such as `expression(x/y)`.
`data`	a dataframe or matrix containing either a) the means μ_i, standard deviations σ_i and degrees of freedom ν_i (optionally) in the first, second and third (optionally) row, or b) sampled data generated from any of R's `distributions` or those implemented in this package (`rDistr`). If `nrow(data)` > 3, sampled data is assumed. The column names must match the variable names.
`second.order`	logical. If `TRUE`, error propagation will be calculated with first- and second-order Taylor expansion. See 'Details'.
`do.sim`	logical. Should Monte Carlo simulation be applied?
`cov`	logical or variance-covariance matrix with the same column names as `data`. See 'Details'.
`df`	an optional scalar with the total degrees of freedom ν_{\mathrm{tot}} of the system.
`nsim`	the number of Monte Carlo simulations to be performed, minimum is 10000.
`alpha`	the 1 - confidence level.
`...`	other parameters to be supplied to future methods.

The implemented methods are:

1) Monte Carlo simulation:
For each variable m in data, simulated data X = [x_1, x_2, …, x_n] with n = nsim samples is generated from a multivariate t-distribution X_{m, n} \sim t(μ, Σ, ν) using means μ_i and covariance matrix \boldsymbol{Σ} constructed from the standard deviations σ_i of each variable. All data is coerced into a new dataframe that has the same covariance structure as the initial data: \boldsymbol{Σ}(\mathtt{data}) = \boldsymbol{Σ}(X_{m, n}). Each row i = 1, …, n of the simulated dataset X_{m, n} is evaluated with expr, y_i = f(x_{m, i}), and summary statistics (mean, sd, median, mad, quantile-based confidence interval based on [\frac{α}{2}, 1-\frac{α}{2}]) are calculated on y.

2) Error propagation:
The propagated error is calculated by first-/second-order Taylor expansion accounting for full covariance structure using matrix algebra.
The following transformations based on two variables x_1, x_2 illustrate the equivalence of the matrix-based approach with well-known classical notations:
First-order mean: \rm{E[y]} = f(\bar{x}_i)
First-order variance: σ_y^2 = {\color{red} \nabla \mathbf{Σ} \nabla^T}:

{ \color{red}[\rm{j_1}\; \rm{j_2}] ≤ft[ \begin{array}{cc} σ_1^2 & σ_1σ_2 \\ σ_2σ_1 & σ_2^2 \end{array} \right] ≤ft[ \begin{array}{c} \rm{j_1} \\ \rm{j_2} \end{array} \right]} = \rm{j_1}^2 σ_1^2 + \rm{2 j_1 j_2} σ_1 σ_2 + \rm{j_2}^2 σ_2^2

= \underbrace{∑_{i=1}^2 \rm{j_i}^2 σ_i^2 + 2∑_{i=1\atop i \neq k}^2∑_{k=1\atop k \neq i}^2 \rm{j_i j_k} σ_{ik}}_{\rm{classical\;notation}} = \frac{1}{1!} ≤ft(∑_{i=1}^2 \frac{\partial f}{\partial x_i} σ_i \right)^2

Second-order mean: \rm{E}[y] = f(\bar{x}_i) + {\color{blue} \frac{1}{2}\rm{tr}(\mathbf{HΣ)}}:

{ \color{blue} \frac{1}{2} \rm{tr} ≤ft[ \begin{array}{cc} \rm{h_1} & \rm{h_2} \\ \rm{h_3} & \rm{h_4} \end{array} \right] ≤ft[ \begin{array}{cc} σ_1^2 & σ_1σ_2 \\ σ_2σ_1 & σ_2^2 \end{array} \right]} = \frac{1}{2} \rm{tr} ≤ft[ \begin{array}{cc} \rm{h_1} σ_1^2 + \rm{h_2}σ_1σ_2 & \rm{h_1}σ_1σ_2 + \rm{h_2}σ_2^2 \\ \rm{h_3} σ_1^2 + \rm{h_4} σ_1σ_2 & \rm{h_3} σ_1σ_2 + \rm{h_4} σ_2^2 \end{array} \right]

= \frac{1}{2}(\rm{h_1}σ_1^2 + \rm{h_2}σ_1σ_2 + \rm{h_3}σ_1σ_2 + \rm{h_4}σ_2^2) = \frac{1}{2!} ≤ft(∑_{i=1}^2 \frac{\partial}{\partial x_i} σ_i \right)^2 \it f

Second-order variance: σ_y^2 = {\color{red} \nabla\mathbf{Σ}\nabla^T} + {\color{blue} \frac{1}{2}\rm{tr}(\mathbf{HΣ HΣ)}}:

{\color{blue}\frac{1}{2} \rm{tr} ≤ft[ \begin{array}{cc} \rm{h_1} & \rm{h_2} \\ \rm{h_3} & \rm{h_4} \end{array} \right] ≤ft[ \begin{array}{cc} \rm{σ_1^2} & \rm{σ_1σ_2} \\ \rm{σ_2σ_1} & \rm{σ_2^2} \end{array} \right] ≤ft[ \begin{array}{cc} \rm{h_1} & \rm{h_2} \\ \rm{h_3} & \rm{h_4} \end{array} \right] ≤ft[ \begin{array}{cc} \rm{σ_1^2} & \rm{σ_1σ_2} \\ \rm{σ_2σ_1} & \rm{σ_2^2} \end{array} \right]} = …

= \frac{1}{2} (\rm{h_1}^2σ_1^4 + \rm{2h_1h_2}σ_1^3σ_2 + \rm{2h_1h_3}σ_1^3σ_2 + \rm{h_2}^2σ_1^2σ_2^2 + \rm{2h_2h_3}σ_1^2σ_2^2 + \rm{h_3}^2σ_1^2σ_2^2 + \rm{2h_1h_4}σ_1^2σ_2^2

+ \rm{2h_2h_4}σ_1σ_2^3 + \rm{2h_3h_4}σ_1σ_2^3 + \rm{h_4}^2σ_2^4 = \frac{1}{2} (\rm{h_1}σ_1^2 + \rm{h_2}σ_1σ_2 + \rm{h_3}σ_1σ_2 + \rm{h_4}σ_2^2)^2

= \frac{1}{2!} ≤ft( ≤ft(∑_{i=1}^2 \frac{\partial}{\partial x_i} σ_i \right)^2 \it f \right)^2

with \mathrm{E}(y) = expectation of y, \mathbf{σ_y^2} = variance of y, {\color{red} \nabla} = the p x n gradient matrix with all partial first derivatives {\color{red} \rm{j_i}}, \mathbf{Σ} = the p x p covariance matrix, {\color{blue}\mathbf{H}} the Hessian matrix with all partial second derivatives {\color{blue} \rm{h_i}}, σ_i = the uncertainties and \rm{tr}(\cdot) = the trace (sum of diagonal) of a matrix. Note that because the Hessian matrices are symmetric, {\color{blue} \rm{h_2}} = {\color{blue} \rm{h_3}}. For a detailed derivation, see 'References'.
The second-order Taylor expansion corrects for bias in nonlinear expressions as the first-order Taylor expansion assumes linearity around \bar{x}_i. There is also a Python library available for second-order error propagation ('soerp', https://pypi.python.org/pypi/soerp). The 'propagate' package gives exactly the same results, see last example under "Examples".
Depending on the input expression, the uncertainty propagation may result in an error that is not normally distributed. The Monte Carlo simulation, starting with a symmetric t-distributions of the variables, can clarify this. For instance, a high tendency from deviation of normality is encountered in formulas in which the error of the denominator is relatively large or in exponential models with a large error in the exponent.

For setups in which there is no symbolic derivation possible (i.e. e <- expression(abs(x)) => "Function 'abs' is not in the derivatives table") the function automatically switches from symbolic (using makeGrad or makeHess) to numeric (numGrad or numHess) differentiation.

The function will try to evaluate the expression in an environment using eval which results in a significant speed enhancement (~ 10-fold). If that fails, evaluation is done over the rows of the simulated data using apply.

cov is used in the following ways:
1) If μ_i, σ_i are supplied, a covariance matrix is built with diagonals σ_i^2, independent of cov = TRUE, FALSE.
2) When simulated data is supplied, a covariance matrix is constructed that either has (cov = TRUE) or has not (cov = FALSE) off-diagonal covariances.
3) The user can supply an own covariance matrix Σ, with the same column/row names as in data.

The expanded uncertainty used for constructing the confidence interval is calculated from the Welch-Satterthwaite degrees of freedom ν_{\mathrm{WS}} of the WelchSatter function.

A list with the following components:

`gradient`	the symbolic gradient vector \nabla of partial first-order derivatives.
`evalGrad`	the evaluated gradient vector \nabla of partial first-order derivatives, also known as the "sensitivity". See `summary.propagate`.
`hessian`	the symbolic Hessian matrix \mathbf{H} of partial second-order derivatives.
`evalHess`	the evaluated Hessian matrix \mathbf{H} of partial second-order derivatives.
`rel.contr`	the relative contribution matrix, see `summary.propagate`.
`covMat`	the covariance matrix \mathbf{Σ} used for Monte Carlo simulation and uncertainty propagation.
`ws.df`	the Welch-Satterthwaite degrees of freedom ν_{\mathrm{ws}}, as obtained from `WelchSatter`.
`k`	the coverage factor k, as calculated by t(1-(α/2), ν_{\mathrm{ws}}).
`u.exp`	the expanded uncertainty, kσ(y), where σ(y) is derived either from the second-order uncertainty, if successfully calculated, or first-order otherwise.
`resSIM`	a vector containing the `nsim` values obtained from the row-wise expression evaluations f(x_{m, i}) of the simulated data in `datSIM`.
`datSIM`	a vector containing the `nsim` simulated multivariate values for each variable in column format.
`prop`	a summary vector containing first-/second-order expectations and uncertainties as well as the confidence interval based on `alpha`.
`sim`	a summary vector containing the mean, standard deviation, median, MAD as well as the confidence interval based on `alpha`.
`expr`	the original expression `expr`.
`data`	the original data `data`.
`alpha`	the otiginal `alpha`.

Andrej-Nikolai Spiess

Error propagation (in general):
An Introduction to error analysis.
Taylor JR.
University Science Books (1996), New York.

Evaluation of measurement data - Guide to the expression of uncertainty in measurement.
JCGM 100:2008 (GUM 1995 with minor corrections).
http://www.bipm.org/utils/common/documents/jcgm/JCGM_100_2008_E.pdf.

Evaluation of measurement data - Supplement 1 to the Guide to the expression of uncertainty in measurement - Propagation of distributions using a Monte Carlo Method.
JCGM 101:2008.
http://www.bipm.org/utils/common/documents/jcgm/JCGM_101_2008_E.pdf.

Higher-order Taylor expansion:
On higher-order corrections for propagating uncertainties.
Wang CM & Iyer HK.
Metrologia (2005), 42: 406-410.

Propagation of uncertainty: Expressions of second and third order uncertainty with third and fourth moments.
Mekid S & Vaja D.
Measurement (2008), 41: 600-609.

Matrix algebra for error propagation:
An Introduction to Error Propagation: Derivation, Meaning and Examples of Equation Cy = FxCxFx^t.
www.nada.kth.se/~kai-a/papers/arrasTR-9801-R3.pdf.

Second order nonlinear uncertainty modeling in strapdown integration using MEMS IMUs.
Zhang M, Hol JD, Slot L, Luinge H.
2011 Proceedings of the 14th International Conference on Information Fusion (FUSION) (2011).

Uncertainty propagation in non-linear measurement equations.
Mana G & Pennecchi F.
Metrologia (2007), 44: 246-251.

A compact tensor algebra expression of the law of propagation of uncertainty.
Bouchot C, Quilantan JLC, Ochoa JCS.
Metrologia (2011), 48: L22-L28.

Nonlinear error propagation law.
Kubacek L.
Appl Math (1996), 41: 329-345.

Monte Carlo simulation (normal- and t-distribution):
MUSE: computational aspects of a GUM supplement 1 implementation.
Mueller M, Wolf M, Roesslein M.
Metrologia (2008), 45: 586-594.

Copulas for uncertainty analysis.
Possolo A.
Metrologia (2010), 47: 262-271.

Multivariate normal distribution:
Stochastic Simulation.
Ripley BD.
Stochastic Simulation (1987). Wiley. Page 98.

Testing for normal distribution:
Testing for Normality.
Thode Jr. HC.
Marcel Dekker (2002), New York.

Approximating the Shapiro-Wilk W-test for non-normality.
Royston P.
Stat Comp (1992), 2: 117-119.

## In these examples, 'nsim = 100000' to save
## Rcmd check time (CRAN). It is advocated
## to use at least 'nsim = 1000000' though...

## Example without given degrees-of-freedom.
EXPR1 <- expression(x/y)
x <- c(5, 0.01)
y <- c(1, 0.01)
DF1 <- cbind(x, y)
RES1 <- propagate(expr = EXPR1, data = DF1, type = "stat", 
                  do.sim = TRUE, verbose = TRUE, 
                  nsim = 100000)
RES1

## Same example with given degrees-of-freedom
## => third row in input 'data'.
EXPR2 <- expression(x/y)
x <- c(5, 0.01, 12)
y <- c(1, 0.01, 5)
DF2 <- cbind(x, y)
RES2 <- propagate(expr = EXPR2, data = DF2, type = "stat", 
                  do.sim = TRUE, verbose = TRUE,
                  nsim = 100000)
RES2

## With the 'summary' function, we can get the
## Welch-Satterthwaite DF's, coverage, expanded uncertainty,
## Gradient and Hessian matrix etc.
summary(RES2)

## Example using a recursive function:
## no Taylor expansion possible, only Monte-Carlo.
a <- c(5, 0.1)
b <- c(100, 2)
DAT <- cbind(a, b)

f <- function(a, b) {
  N <- 0
  for (i in 1:100) {
    N <- N + i * log(a) + b^(1/i)
  }
  return(N)
}

propagate(f, DAT, nsim = 100000)

## Not run: 
################# GUM 2008 (1) ########################
## Example in Annex H.1 from the GUM 2008 manual
## (see 'References'), an end gauge calibration
## study. We use only first-order error propagation,
## with total df = 16 and alpha = 0.01, 
## as detailed in GUM H.1.6.
EXPR3 <- expression(ls + d - ls * (da * the + as * dt))
ls <- c(50000623, 25)
d <- c(215, 9.7)
da <- c(0, 0.58E-6)
the <- c(-0.1, 0.41)
as <- c(11.5E-6, 1.2E-6)
dt <- c(0, 0.029)
DF3 <- cbind(ls, d, da, the, as, dt)
RES3 <- propagate(expr = EXPR3, data = DF3, second.order = FALSE,
                  df = 16, alpha = 0.01)
RES3
## propagate: sd.1 = 31.71 
## GUM H.1.4/H.6c: u = 32  

## Expanded uncertainty, from summary function.
summary(RES3)
## propagate: 92.62
## GUM H.1.6: 93

## Proof that covariance of Monte-Carlo
## simulated dataset is "fairly"" the same 
## as from initial data.
RES3$covMat
cov(RES3$datSIM)
all.equal(RES3$covMat, cov(RES3$datSIM))

## Now using second-order Taylor expansion.
RES4 <- propagate(expr = EXPR3, data = DF3)
RES4
## propagate: sd.2 = 33.91115
## GUM H.1.7: u = 34.
## Also similar to the non-matrix-based approach
## in Wang et al. (2005, page 408): u1 = 33.91115.
## NOTE: After second-order correction ("sd.2"), 
## uncertainty is more similar to the uncertainty
## obtained from Monte Carlo simulation!

#################### GUM 2008 (2) #################
## Example in Annex H.2 from the GUM 2008 manual
## (see 'References'), simultaneous resistance
## and reactance measurement.
data(H.2)

## This gives exactly the means, uncertainties and
## correlations as given in Table H.2:
colMeans(H.2)
sqrt(colVarsC(H.2))/sqrt(5)
cor(H.2)

## H.2.3 Approach 1 using mean values and
## standard uncertainties:
EXPR6a <- expression((V/I) *  cos(phi)) ## R
EXPR6b <- expression((V/I) *  sin(phi)) ## X
EXPR6c <- expression(V/I) ## Z
MEAN6 <- colMeans(H.2)
SD6 <- sqrt(colVarsC(H.2))
DF6 <- rbind(MEAN6, SD6)
COV6ab <- cov(H.2) ## covariance matrix of V, I, phi
COV6c <- cov(H.2[, 1:2])  ## covariance matrix of V, I

RES6a <- propagate(expr = EXPR6a, data = DF6, cov = COV6ab)
RES6b <- propagate(expr = EXPR6b, data = DF6, cov = COV6ab)
RES6c <- propagate(expr = EXPR6c, data = DF6[, 1:2], 
                   cov = COV6c)

## This gives exactly the same values of mean and sd/sqrt(5)
## as given in Table H.4.
RES6a$prop # 0.15892/sqrt(5) = 0.071
RES6b$prop # 0.66094/sqrt(5) = 0.296
RES6c$prop # 0.52846/sqrt(5) = 0.236

######### GUM 2008 Supplement 1 (1) #######################
## Example from 9.2.2 of the GUM 2008 Supplement 1
## (see 'References'), normally distributed input
## quantities. Assign values as in 9.2.2.1.
EXPR7 <- expression(X1 + X2 + X3 + X4)
X1 <- c(0, 1)
X2 <- c(0, 1)
X3 <- c(0, 1)
X4 <- c(0, 1)
DF7 <- cbind(X1, X2, X3, X4)
RES7 <- propagate(expr = EXPR7, data = DF7, nsim = 1E5)
## This will give exactly the same results as in 
## 9.2.2.6, Table 2.
RES7

######### GUM 2008 Supplement 1 (2) #######################
## Example from 9.3 of the GUM 2008 Supplement 1
## (see 'References'), mass calibration.
## Formula 24 in 9.3.1.3 and values as in 9.3.1.4, Table 5.
EXPR8 <- expression((Mrc + dMrc) * (1 + (Pa - Pa0) * ((1/Pw) - (1/Pr))) - Mnom)
Mrc <- rnorm(1E5, 100000, 0.050)
dMrc <- rnorm(1E5, 1.234, 0.020)
Pa <- runif(1E5, 1.10, 1.30)  ## E(Pa) = 1.2, (b-a)/2 = 0.1 
Pw <- runif(1E5, 7000, 9000)  ## E(Pw) = 8000, (b-a)/2 = 1000
Pr <- runif(1E5, 7950, 8050) ## E(Pr) = 8000, (b-a)/2 = 50
Pa0 <- 1.2 
Mnom <- 100000
DF8 <- cbind(Mrc, dMrc, Pa, Pw, Pr, Pa0, Mnom)
RES8 <- propagate(expr = EXPR8, data = DF8, nsim = 1E5)
## This will give exactly the same results as in 
## 9.3.2.3, Table 6
RES8
RES8
 
######### GUM 2008 Supplement 1 (3) #######################
## Example from 9.4 of the GUM 2008 Supplement 1
## (see 'References'), comparioson loss in microwave
## power meter calibration, zero covariance.
## Formula 28 in 9.4.1.5 and values as in 9.4.1.7.
EXPR9 <- expression(X1^2 - X2^2)
X1 <- c(0.050, 0.005)
X2 <- c(0, 0.005)
DF9 <- cbind(X1, X2)
RES9a <- propagate(expr = EXPR9, data = DF9, nsim = 1E5)
## This will give exactly the same results as in 
## 9.4.2.2.7, Table 8, x1 = 0.050.
RES9a

## Using covariance matrix with r(x1, x2) = 0.9
## We convert to covariances using cor2cov.
COR9 <- matrix(c(1, 0.9, 0.9, 1), nrow = 2)
COV9 <- cor2cov(COR9, c(0.005^2, 0.005^2))
colnames(COV9) <- c("X1", "X2")
rownames(COV9) <- c("X1", "X2")
RES9b <- propagate(expr = EXPR9, data = DF9, cov = COV9)
## This will give exactly the same results as in 
## 9.4.3.2.1, Table 9, x1 = 0.050.
RES9b

######### GUM 2008 Supplement 1 (4) #######################
## Example from 9.5 of the GUM 2008 Supplement 1
## (see 'References'), gauge block calibration.
## Assignment of PDF's as in Table 10 of 9.5.2.1.
EXPR10 <- expression(Ls + D + d1 + d2 - Ls *(da *(t0 + Delta) + as * dt) - Lnom)
Lnom <- 50000000
Ls <- propagate:::rst(1000000, mean = 50000623, sd  = 25, df = 18)
D <- propagate:::rst(1000000, mean = 215, sd = 6, df = 25)
d1 <- propagate:::rst(1000000, mean = 0, sd = 4, df = 5)
d2 <- propagate:::rst(1000000, mean = 0, sd = 7, df = 8)
as <- runif(1000000, 9.5E-6, 13.5E-6)
t0 <- rnorm(1000000, -0.1, 0.2)
Delta <- propagate:::rarcsin(1000000, -0.5, 0.5)
da <- propagate:::rctrap(1000000, -1E-6, 1E-6, 0.1E-6)
dt <- propagate:::rctrap(1000000, -0.050, 0.050, 0.025)
DF10 <- cbind(Ls, D, d1, d2, as, t0, Delta, da, dt, Lnom)
RES10 <- propagate(expr = EXPR10, data = DF10, cov = FALSE, alpha = 0.01)
RES10
## This gives the same results as in 9.5.4.2, Table 11.
## However: results are exacter than in the GUM 2008
## manual, especially when comparing sd(Monte Carlo) with sd.2!
## GUM 2008 gives 32 and 36, respectively.
RES10

########## Comparison to Pythons 'soerp' ###################
## Exactly the same results as under 
## https://pypi.python.org/pypi/soerp ! 
EXPR11 <- expression(C * sqrt((520 * H * P)/(M *(t + 460))))
H <- c(64, 0.5)
M <- c(16, 0.1)
P <- c(361, 2)
t <- c(165, 0.5)
C <- c(38.4, 0) 
DAT11 <- makeDat(EXPR11)
RES11 <- propagate(expr = EXPR11, data = DAT11) 
RES11

## End(Not run)