propagate: Error propagation using different methods

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

A general function for the calculation of error propagation by Monte Carlo simulation, permutation and first/second-order Taylor expansion including covariances. Can be used for qPCR data, but any data that should be subjected to error propagation analysis will do. The different methods can be used for any expression based on either replicate or summary data (mean & standard deviation).

Usage

1
2
3
4
5
propagate(expr, data, type = c("raw", "stat"), second.order = TRUE,
          do.sim = FALSE, dist.sim = c("norm", "tnorm"), use.cov = FALSE, 
          nsim = 10000, do.perm = FALSE, perm.crit = NULL, ties = NULL, 
          nperm = 2000, alpha = 0.05, plot = TRUE, logx = FALSE, 
          verbose = FALSE, ...)  

Arguments

expr

an expression, such as expression(x/y).

data

a dataframe or matrix containing either a) the replicates in columns or b) the means in the first row and the standard deviations in the second row. The variable names must be defined in the column headers.

type

either raw if replicates are given, or stat if means and standard deviations are supplied.

second.order

logical. If TRUE, error propagation will be calculated with first AND second-order Taylor expansion. See 'Details'.

do.sim

logical. Should Monte Carlo simulation be applied?

dist.sim

"norm" will use a multivariate normal distribution, "tnorm" a multivariate truncated normal distribution. See 'Details'.

use.cov

logical or variance-covariance matrix with the same column descriptions and column order as data. If TRUE together with replicates, the covariances are calculated from these and used within Monte Carlo simulation and error propagation. If type = "stat", a square variance-covariance matrix can be supplied in the right dimensions (n x n, n = number of variables). If FALSE, Monte Carlo simulation and error propagation use only the diagonal (variances).

nsim

the number of simulations to be performed, minimum is 5000.

do.perm

logical. Should permutation error analysis be applied?

perm.crit

a character string of one or more criteria defining the null hypothesis for the permutation p-value. See 'Details'.

ties

a vector defining the columns that should be tied together for the permutations. See 'Details'.

nperm

the number of permutations to be performed.

alpha

the confidence level.

plot

logical. Should histograms with confidence intervals (in blue) be plotted for all methods?

logx

logical. Should the x-axis of the graphs have logarithmic scale?

verbose

logical. If TRUE, a longer output is given including the simulated data, derivatives, covariance matrix, etc.

...

other parameters to be supplied to hist, boxplot or abline.

Details

The implemented methods are:

1) Monte Carlo simulation:
For each variable in data, simulated data with nsim samples is generated from a multivariate (truncated) normal distribution using mean μ and standard deviation σ of each variable. All data is coerced into a new dataset that has the same covariance structure as the initial data. Each row of the simulated dataset is evaluated and summary statistics are calculated. In scenarios that are nonlinear in nature the distribution of the result values can be skewed, mainly due to the simulated values at the extreme end of the normal distribution. Setting dist.sim = "tnorm" will fit a multivariate normal distribution, calculate the lower/upper 2.5% quantile on each side for each input variable and use these as bounds for simulating from a multivariate truncated normal distribution. This will (in part) remove some of the skewness in the result distribution.

2) Permutation approach:
The original data is permutated nperm times by binding observations together according to ties. The ties bind observations that can be independent measurements from the same sample. In qPCR terms, this would be a real-time PCR for two different genes on the same sample. If ties are omitted, the observations are shuffled independently. In detail, two datasets are created for each permutation: Dataset1 samples the rows (replicates) of the data according to ties. Dataset2 is obtained by sampling the columns (samples), also binding columns as defined in ties. For both datasets, the permutations are evaluated and statistics are collected. A confidence interval is calculated from all evaluations of Dataset1. A p-value is calculated from all permutations that follow perm.crit, whereby init reflects the permutations of the initial data and perm the permutations of the randomly reallocated samples. Thus, the p-value gives a measure against the null hypothesis that the result in the initial group is just by chance. See also 'Examples'.
The criterion for the permutation p-value (perm.crit) has to be defined by the user. For example, let's say we calculate some value 0.2 which is a ratio between two groups. We would hypothesize that by randomly reallocating the values between the groups, the mean values are not equal or smaller than in the initial data. We would thus define perm.crit as "perm < init" meaning that we want to test if the mean of the initial data (init) is frequently smaller than by the randomly allocated data (perm). The default (NULL) is to test all three variants "perm > init", "perm == init" and "perm < init".

3) Error propagation:
The propagated error is calculated by first and second-order Taylor expansion using matrix algebra. Often omitted, but important in models where the variables are correlated, is the second covariance term:

σ_y^2 = \underbrace{∑_{i=1}^N≤ft(\frac{\partial f}{\partial x_i} \right)^2 σ_i^2}_{\rm{variance}} + \underbrace{2∑_{i=1\atop i \neq j}^N∑_{j=1\atop j \neq i}^N≤ft(\frac{\partial f}{\partial x_i} \right)≤ft(\frac{\partial f}{\partial x_j} \right) σ_{ij}}_{\rm{covariance}}

propagate calculates the propagated error either with or without covariances using matrix algebra for first- and second-order (since version 1.3-8) Taylor expansion.
First-order:
\mathrm{E}(y) = f(\bar{x}_i)
\mathbf{C}_y = \nabla_x\mathbf{C}_x\nabla_x^T
Second-order:
\mathrm{E}(y) = f(\bar{x}_i) + \frac{1}{2}[tr(\mathbf{H}_{xx}\mathbf{C}_x)]
\mathbf{C}_y = \nabla_x\mathbf{C}_x\nabla_x^T + \frac{1}{2}[tr(\mathbf{H}_{xx}\mathbf{C}_x\mathbf{H}_{xx}\mathbf{C}_x)]

with \mathrm{E}(y) = expectation of y, \mathbf{C}_y = variance of y, \nabla_x = the p x n gradient matrix with all partial first derivatives, \mathbf{C}_x = the p x p covariance matrix, \mathbf{H}_{xx} the Hessian matrix with all partial second derivatives and tr(\cdot) = the trace (sum of diagonal) of the matrix. For a detailed derivation, see 'References'.
The second-order Taylor expansion (second.order = TRUE) corrects for bias in nonlinear expressions as the first-order Taylor expansion assumes linearity around \bar{x}_i.
Depending on the input formula, the error propagation may result in an error that is not normally distributed. The Monte Carlo simulation, starting with normal distributions of the variables, can clarify this. A high tendency from deviation of normality is encountered in formulas in which the error of the denominator is relatively high or in exponential models where the exponent has a high error. This is one of the problems that is inherent in real-time PCR analysis, as the classical ratio calculation with efficiencies (i.e. by the delta-ct method) is of an exponential type.

Value

A plot containing histograms of the Monte Carlo simulation, the permutation values and error propagation. Additionally inserted are a boxplot, median values in red and confidence intervals as blue borders.

A list with the following components (if verbose = TRUE):

data.Sim

the Monte Carlo simulated data with evaluations in the last column.

data.Perm

the data of the permutated observations and samples with corresponding evaluations and the decision according to perm.crit.

data.Prop

nsim values generated from a normal distribution with mean and s.d. as calculated from the propagated error.

gradient

the evaluated gradient vector \nabla_x of partial first derivatives.

hessian

the evaluated hessian matrix \mathbf{H}_{xx} of partial second derivatives.

covMat

the covariance matrix \mathbf{C}_x used for Monte Carlo simulation and error propagation.

summary

a summary of the collected statistics, given as a dataframe. These are: mean, s.d. median, mad, lower/upper confidence interval and permutation p-values.

Author(s)

Andrej-Nikolai Spiess

References

Error propagation (in general):
An Introduction to error analysis.
Taylor JR.
University Science Books (1996), New York.

Evaluation of measurement data - Guide to the expression of uncertainty in measurement.
JCGM 100:2008 (GUM 1995 with minor corrections).
https://www.bipm.org/utils/common/documents/jcgm/JCGM_100_2008_E.pdf.

Higher-order Taylor expansion:
On higher-order corrections for propagating uncertainties.
Wang CM & Iyer HK.
Metrologia (2005), 42: 406-410.

Accuracy of error propagation exemplified with ratios of random variables.
Winzer PJ.
Rev Sci Instrum (2000), 72: 1447-1454.

Matrix algebra for error propagation:
An Introduction to Error Propagation: Derivation, Meaning and Examples of Equation Cy = FxCxFx^t.
www.nada.kth.se/~kai-a/papers/arrasTR-9801-R3.pdf.

Second order nonlinear uncertainty modeling in strapdown integration using MEMS IMUs.
Zhang M, Hol JD, Slot L, Luinge H.
2011 Proceedings of the 14th International Conference on Information Fusion (FUSION) (2011).

Error propagation (in qPCR):
Error propagation in relative real-time reverse transcription polymerase chain reaction quantification models: The balance between accuracy and precision.
Nordgard O, Kvaloy JT, Farmen RK, Heikkila R.
Anal Biochem (2006), 356: 182-193.

qBase relative quantification framework and software for management and analysis of real-time quantitative PCR data.
Hellemans J, Mortier G, De Paepe A, Speleman F, Vandesompele J.
Genome Biol (2007), 8: R19.

Multivariate normal distribution:
Stochastic Simulation.
Ripley BD.
Stochastic Simulation (1987). Wiley. Page 98.

Testing for normal distribution:
Testing for Normality.
Thode Jr. HC.
Marcel Dekker (2002), New York.

Approximating the Shapiro-Wilk W-test for non-normality.
Royston P.
Stat Comp (1992), 2: 117-119.

See Also

Function ratiocalc for error analysis within qPCR ratio calculation.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
## From summary data just calculate 
## Monte-Carlo and propagated error.
EXPR <- expression(x/y)
x <- c(5, 0.01)
y <- c(1, 0.01)
DF <- cbind(x, y)
RES1 <- propagate(expr = EXPR, data = DF, type = "stat", 
                 do.sim = TRUE, verbose = TRUE)

## Do Shapiro-Wilks test on Monte Carlo evaluations 
## !maximum 5000 datapoints can be used!
## => p.value on border to non-normality
shapiro.test(RES1$data.Sim[1:5000, 3])
## How about a graphical analysis:
qqnorm(RES1$data.Sim[, 3])

## Using raw data
## If data is of unequal length,
## use qpcR:::cbind.na to avoid replication!
## Do permutations (swap x and y values)
## and simulations.
EXPR <- expression(x*y)
x <- c(2, 2.1, 2.2, 2, 2.3, 2.1)
y <- c(4, 4, 3.8, 4.1, 3.1)
DF <- qpcR:::cbind.na(x, y)  
RES2 <- propagate(EXPR, DF, type = "raw", do.perm = TRUE, 
                 do.sim = TRUE, verbose = TRUE)
RES2$summary

## For replicate data, using relative 
## quantification ratio from qPCR.
## How good is the estimation of the propagated error?
## Done without using covariance in the 
## calculation and simulation.
## cp's and efficiencies are tied together
## because they are two observations on the
## same sample!
## As we are using an exponential type function,
## better to logarithmize the x-axis.
EXPR <- expression((E1^cp1)/(E2^cp2))
E1 <- c(1.73, 1.75, 1.77)
cp1 <- c(25.77,26.14,26.33)
E2 <-  c(1.72,1.68,1.65)
cp2 <- c(33.84,34.04,33.33)
DF <- cbind(E1, cp1, E2, cp2)
RES3 <- propagate(EXPR, DF, type = "raw", do.sim = TRUE,
                 do.perm = TRUE, verbose = TRUE, logx = TRUE)                 
## STRONG deviation from normality!
shapiro.test(RES3$data.Sim[1:5000, 5])
qqnorm(RES3$data.Sim[, 5])

## Same setup as above but also
## using a permutation approach
## for resampling the confidence interval.
## Cp's and efficiencies are tied together
## because they are two observations on the
## same sample! 
## Similar to what REST2008 software does.
RES4 <- propagate(EXPR, DF, type = "raw", do.sim = TRUE,
                 perm.crit = NULL, do.perm = TRUE, 
                 ties = c(1, 1, 2, 2), logx = TRUE, verbose = TRUE)
RES4$summary
## p-value of 0 in perm < init indicates that not a single 
## exchange of group memberships resulted in a smaller ratio!
              
## Proof that covariance of Monte-Carlo
## simulated dataset is the same as from 
## initial data.
RES4$covMat
cov(RES4$data.Sim[, 1:4])
all.equal(RES4$covMat, cov(RES4$data.Sim[, 1:4]))

Example output

Loading required package: MASS
Loading required package: minpack.lm
Loading required package: rgl
Loading required package: robustbase
Loading required package: Matrix
Warning messages:
1: In rgl.init(initValue, onlyNULL) : RGL: unable to open X11 display
2: 'rgl_init' failed, running with rgl.useNULL = TRUE 
3: .onUnload failed in unloadNamespace() for 'rgl', details:
  call: fun(...)
  error: object 'rgl_quit' not found 

	Shapiro-Wilk normality test

data:  RES1$data.Sim[1:5000, 3]
W = 0.99957, p-value = 0.3455

                    Sim      Perm      Prop
Mean          8.0433333 8.0546140 8.0433333
Std.dev.      0.9686642 0.8518364 0.9689414
Median        8.0326711 8.2000000        NA
MAD           0.9615432 0.6078660        NA
Conf.lower    6.2038758 6.3162500 6.1442430
Conf.upper   10.0032198 9.3437500 9.9424236
perm > init          NA 0.0000000        NA
perm == init         NA 1.0000000        NA
perm < init          NA 0.0000000        NA

	Shapiro-Wilk normality test

data:  RES3$data.Sim[1:5000, 5]
W = 0.71814, p-value < 2.2e-16

                    Sim       Perm        Prop
Mean         0.07121970 0.06662312  0.06785923
Std.dev.     0.06873898 0.05285656  0.04736556
Median       0.05088336 0.04824881          NA
MAD          0.03762061 0.03577432          NA
Conf.lower   0.01077819 0.01650290 -0.02497556
Conf.upper   0.26038386 0.17800356  0.16069402
perm > init          NA 0.49300000          NA
perm == init         NA 0.50700000          NA
perm < init          NA 0.00000000          NA
       E1    cp1          E2       cp2
E1  4e-04 0.0000 0.000000000 0.0000000
cp1 0e+00 0.0811 0.000000000 0.0000000
E2  0e+00 0.0000 0.001233333 0.0000000
cp2 0e+00 0.0000 0.000000000 0.1340333
              E1           cp1            E2           cp2
E1  4.000000e-04  4.344267e-20  2.425599e-19  1.933041e-18
cp1 4.344267e-20  8.110000e-02 -7.984198e-20 -1.410280e-18
E2  2.425599e-19 -7.984198e-20  1.233333e-03 -3.590153e-18
cp2 1.933041e-18 -1.410280e-18 -3.590153e-18  1.340333e-01
[1] TRUE

qpcR documentation built on May 2, 2019, 5:17 a.m.

Related to propagate in qpcR...