Description Usage Arguments Details Value Author(s) References See Also Examples
A general function for the calculation of error propagation by Monte Carlo simulation, permutation and first/second-order Taylor expansion including covariances. Can be used for qPCR data, but any data that should be subjected to error propagation analysis will do. The different methods can be used for any expression based on either replicate or summary data (mean & standard deviation).
1 2 3 4 5 |
expr |
an expression, such as |
data |
a dataframe or matrix containing either a) the replicates in columns or b) the means in the first row and the standard deviations in the second row. The variable names must be defined in the column headers. |
type |
either |
second.order |
logical. If |
do.sim |
logical. Should Monte Carlo simulation be applied? |
dist.sim |
|
use.cov |
logical or variance-covariance matrix with the same column descriptions and column order as |
nsim |
the number of simulations to be performed, minimum is 5000. |
do.perm |
logical. Should permutation error analysis be applied? |
perm.crit |
a character string of one or more criteria defining the null hypothesis for the permutation p-value. See 'Details'. |
ties |
a vector defining the columns that should be tied together for the permutations. See 'Details'. |
nperm |
the number of permutations to be performed. |
alpha |
the confidence level. |
plot |
logical. Should histograms with confidence intervals (in blue) be plotted for all methods? |
logx |
logical. Should the x-axis of the graphs have logarithmic scale? |
verbose |
logical. If |
... |
other parameters to be supplied to |
The implemented methods are:
1) Monte Carlo simulation:
For each variable in data
, simulated data with nsim
samples is generated from a multivariate (truncated) normal distribution using mean μ and standard deviation σ of each variable. All data is coerced into a new dataset that has the same covariance structure as the initial data
. Each row of the simulated dataset is evaluated and summary statistics are calculated. In scenarios that are nonlinear in nature the distribution of the result values can be skewed, mainly due to the simulated values at the extreme end of the normal distribution. Setting dist.sim = "tnorm"
will fit a multivariate normal distribution, calculate the lower/upper 2.5% quantile on each side for each input variable and use these as bounds for simulating from a multivariate truncated normal distribution. This will (in part) remove some of the skewness in the result distribution.
2) Permutation approach:
The original data
is permutated nperm
times by binding observations together according to ties
. The ties
bind observations that can be independent measurements from the same sample. In qPCR terms, this would be a real-time PCR for two different genes on the same sample. If ties
are omitted, the observations are shuffled independently. In detail, two datasets are created for each permutation: Dataset1 samples the rows (replicates) of the data according to ties
. Dataset2 is obtained by sampling the columns (samples), also binding columns as defined in ties
. For both datasets, the permutations are evaluated and statistics are collected. A confidence interval is calculated from all evaluations of Dataset1. A p-value is calculated from all permutations that follow perm.crit
, whereby init
reflects the permutations of the initial data
and perm
the permutations of the randomly reallocated samples. Thus, the p-value gives a measure against the null hypothesis that the result in the initial group is just by chance. See also 'Examples'.
The criterion for the permutation p-value (perm.crit
) has to be defined by the user. For example, let's say we calculate some value 0.2 which is a ratio between two groups. We would hypothesize that by randomly reallocating the values between the groups, the mean values are not equal or smaller than in the initial data. We would thus define perm.crit
as "perm < init" meaning that we want to test if the mean of the initial data (init
) is frequently smaller than by the randomly allocated data (perm
). The default (NULL
) is to test all three variants "perm > init", "perm == init" and "perm < init".
3) Error propagation:
The propagated error is calculated by first and second-order Taylor expansion using matrix algebra. Often omitted, but important in models where the variables are correlated, is the second covariance term:
σ_y^2 = \underbrace{∑_{i=1}^N≤ft(\frac{\partial f}{\partial x_i} \right)^2 σ_i^2}_{\rm{variance}} + \underbrace{2∑_{i=1\atop i \neq j}^N∑_{j=1\atop j \neq i}^N≤ft(\frac{\partial f}{\partial x_i} \right)≤ft(\frac{\partial f}{\partial x_j} \right) σ_{ij}}_{\rm{covariance}}
propagate
calculates the propagated error either with or without covariances using matrix algebra for first- and second-order (since version 1.3-8) Taylor expansion.
First-order:
\mathrm{E}(y) = f(\bar{x}_i)
\mathbf{C}_y = \nabla_x\mathbf{C}_x\nabla_x^T
Second-order:
\mathrm{E}(y) = f(\bar{x}_i) + \frac{1}{2}[tr(\mathbf{H}_{xx}\mathbf{C}_x)]
\mathbf{C}_y = \nabla_x\mathbf{C}_x\nabla_x^T + \frac{1}{2}[tr(\mathbf{H}_{xx}\mathbf{C}_x\mathbf{H}_{xx}\mathbf{C}_x)]
with \mathrm{E}(y) = expectation of y, \mathbf{C}_y = variance of y, \nabla_x = the p x n gradient matrix with all partial first derivatives, \mathbf{C}_x = the p x p covariance matrix, \mathbf{H}_{xx} the Hessian matrix with all partial second derivatives and tr(\cdot) = the trace (sum of diagonal) of the matrix. For a detailed derivation, see 'References'.
The second-order Taylor expansion (second.order = TRUE
) corrects for bias in nonlinear expressions as the first-order Taylor expansion assumes linearity around \bar{x}_i.
Depending on the input formula, the error propagation may result in an error that is not normally distributed. The Monte Carlo simulation, starting with normal distributions of the variables, can clarify this. A high tendency from deviation of normality is encountered in formulas in which the error of the denominator is relatively high or in exponential models where the exponent has a high error. This is one of the problems that is inherent in real-time PCR analysis, as the classical ratio calculation with efficiencies (i.e. by the delta-ct method) is of an exponential type.
A plot containing histograms of the Monte Carlo simulation, the permutation values and error propagation. Additionally inserted are a boxplot, median values in red and confidence intervals as blue borders.
A list with the following components (if verbose = TRUE
):
data.Sim |
the Monte Carlo simulated data with evaluations in the last column. |
data.Perm |
the data of the permutated observations and samples with corresponding evaluations and the decision according to |
data.Prop |
|
gradient |
the evaluated gradient vector \nabla_x of partial first derivatives. |
hessian |
the evaluated hessian matrix \mathbf{H}_{xx} of partial second derivatives. |
covMat |
the covariance matrix \mathbf{C}_x used for Monte Carlo simulation and error propagation. |
summary |
a summary of the collected statistics, given as a dataframe. These are: mean, s.d. median, mad, lower/upper confidence interval and permutation p-values. |
Andrej-Nikolai Spiess
Error propagation (in general):
An Introduction to error analysis.
Taylor JR.
University Science Books (1996), New York.
Evaluation of measurement data - Guide to the expression of uncertainty in measurement.
JCGM 100:2008 (GUM 1995 with minor corrections).
https://www.bipm.org/utils/common/documents/jcgm/JCGM_100_2008_E.pdf.
Higher-order Taylor expansion:
On higher-order corrections for propagating uncertainties.
Wang CM & Iyer HK.
Metrologia (2005), 42: 406-410.
Accuracy of error propagation exemplified with ratios of random variables.
Winzer PJ.
Rev Sci Instrum (2000), 72: 1447-1454.
Matrix algebra for error propagation:
An Introduction to Error Propagation: Derivation, Meaning and Examples of Equation Cy = FxCxFx^t.
www.nada.kth.se/~kai-a/papers/arrasTR-9801-R3.pdf.
Second order nonlinear uncertainty modeling in strapdown integration using MEMS IMUs.
Zhang M, Hol JD, Slot L, Luinge H.
2011 Proceedings of the 14th International Conference on Information Fusion (FUSION) (2011).
Error propagation (in qPCR):
Error propagation in relative real-time reverse transcription polymerase chain reaction quantification models: The balance between accuracy and precision.
Nordgard O, Kvaloy JT, Farmen RK, Heikkila R.
Anal Biochem (2006), 356: 182-193.
qBase relative quantification framework and software for management and analysis of real-time quantitative PCR data.
Hellemans J, Mortier G, De Paepe A, Speleman F, Vandesompele J.
Genome Biol (2007), 8: R19.
Multivariate normal distribution:
Stochastic Simulation.
Ripley BD.
Stochastic Simulation (1987). Wiley. Page 98.
Testing for normal distribution:
Testing for Normality.
Thode Jr. HC.
Marcel Dekker (2002), New York.
Approximating the Shapiro-Wilk W-test for non-normality.
Royston P.
Stat Comp (1992), 2: 117-119.
Function ratiocalc
for error analysis within qPCR ratio calculation.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 | ## From summary data just calculate
## Monte-Carlo and propagated error.
EXPR <- expression(x/y)
x <- c(5, 0.01)
y <- c(1, 0.01)
DF <- cbind(x, y)
RES1 <- propagate(expr = EXPR, data = DF, type = "stat",
do.sim = TRUE, verbose = TRUE)
## Do Shapiro-Wilks test on Monte Carlo evaluations
## !maximum 5000 datapoints can be used!
## => p.value on border to non-normality
shapiro.test(RES1$data.Sim[1:5000, 3])
## How about a graphical analysis:
qqnorm(RES1$data.Sim[, 3])
## Using raw data
## If data is of unequal length,
## use qpcR:::cbind.na to avoid replication!
## Do permutations (swap x and y values)
## and simulations.
EXPR <- expression(x*y)
x <- c(2, 2.1, 2.2, 2, 2.3, 2.1)
y <- c(4, 4, 3.8, 4.1, 3.1)
DF <- qpcR:::cbind.na(x, y)
RES2 <- propagate(EXPR, DF, type = "raw", do.perm = TRUE,
do.sim = TRUE, verbose = TRUE)
RES2$summary
## For replicate data, using relative
## quantification ratio from qPCR.
## How good is the estimation of the propagated error?
## Done without using covariance in the
## calculation and simulation.
## cp's and efficiencies are tied together
## because they are two observations on the
## same sample!
## As we are using an exponential type function,
## better to logarithmize the x-axis.
EXPR <- expression((E1^cp1)/(E2^cp2))
E1 <- c(1.73, 1.75, 1.77)
cp1 <- c(25.77,26.14,26.33)
E2 <- c(1.72,1.68,1.65)
cp2 <- c(33.84,34.04,33.33)
DF <- cbind(E1, cp1, E2, cp2)
RES3 <- propagate(EXPR, DF, type = "raw", do.sim = TRUE,
do.perm = TRUE, verbose = TRUE, logx = TRUE)
## STRONG deviation from normality!
shapiro.test(RES3$data.Sim[1:5000, 5])
qqnorm(RES3$data.Sim[, 5])
## Same setup as above but also
## using a permutation approach
## for resampling the confidence interval.
## Cp's and efficiencies are tied together
## because they are two observations on the
## same sample!
## Similar to what REST2008 software does.
RES4 <- propagate(EXPR, DF, type = "raw", do.sim = TRUE,
perm.crit = NULL, do.perm = TRUE,
ties = c(1, 1, 2, 2), logx = TRUE, verbose = TRUE)
RES4$summary
## p-value of 0 in perm < init indicates that not a single
## exchange of group memberships resulted in a smaller ratio!
## Proof that covariance of Monte-Carlo
## simulated dataset is the same as from
## initial data.
RES4$covMat
cov(RES4$data.Sim[, 1:4])
all.equal(RES4$covMat, cov(RES4$data.Sim[, 1:4]))
|
Loading required package: MASS
Loading required package: minpack.lm
Loading required package: rgl
Loading required package: robustbase
Loading required package: Matrix
Warning messages:
1: In rgl.init(initValue, onlyNULL) : RGL: unable to open X11 display
2: 'rgl_init' failed, running with rgl.useNULL = TRUE
3: .onUnload failed in unloadNamespace() for 'rgl', details:
call: fun(...)
error: object 'rgl_quit' not found
Shapiro-Wilk normality test
data: RES1$data.Sim[1:5000, 3]
W = 0.99957, p-value = 0.3455
Sim Perm Prop
Mean 8.0433333 8.0546140 8.0433333
Std.dev. 0.9686642 0.8518364 0.9689414
Median 8.0326711 8.2000000 NA
MAD 0.9615432 0.6078660 NA
Conf.lower 6.2038758 6.3162500 6.1442430
Conf.upper 10.0032198 9.3437500 9.9424236
perm > init NA 0.0000000 NA
perm == init NA 1.0000000 NA
perm < init NA 0.0000000 NA
Shapiro-Wilk normality test
data: RES3$data.Sim[1:5000, 5]
W = 0.71814, p-value < 2.2e-16
Sim Perm Prop
Mean 0.07121970 0.06662312 0.06785923
Std.dev. 0.06873898 0.05285656 0.04736556
Median 0.05088336 0.04824881 NA
MAD 0.03762061 0.03577432 NA
Conf.lower 0.01077819 0.01650290 -0.02497556
Conf.upper 0.26038386 0.17800356 0.16069402
perm > init NA 0.49300000 NA
perm == init NA 0.50700000 NA
perm < init NA 0.00000000 NA
E1 cp1 E2 cp2
E1 4e-04 0.0000 0.000000000 0.0000000
cp1 0e+00 0.0811 0.000000000 0.0000000
E2 0e+00 0.0000 0.001233333 0.0000000
cp2 0e+00 0.0000 0.000000000 0.1340333
E1 cp1 E2 cp2
E1 4.000000e-04 4.344267e-20 2.425599e-19 1.933041e-18
cp1 4.344267e-20 8.110000e-02 -7.984198e-20 -1.410280e-18
E2 2.425599e-19 -7.984198e-20 1.233333e-03 -3.590153e-18
cp2 1.933041e-18 -1.410280e-18 -3.590153e-18 1.340333e-01
[1] TRUE
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.