Compute the Statistical Significance of Each Replicate Combination
Description
In case a PhyloExpressionSet or DivergenceExpressionSet stores replicates for each developmental stage or experiment, this function allows to compute the pvalues quantifying the statistical significance of the underlying pattern for all combinations of replicates.
Usage
1 2  CombinatorialSignificance(ExpressionSet, replicates,
TestStatistic = "FlatLineTest", permutations = 1000, parallel = FALSE)

Arguments
ExpressionSet 
a standard PhyloExpressionSet or DivergenceExpressionSet object. 
replicates 
a numeric vector storing the number of replicates within each developmental stage or experiment. In case replicate stores only one value, then the function assumes that each developmental stage or experiment stores the same number of replicates. 
TestStatistic 
a string defining the type of test statistics to be used to quantify the statistical significance the present phylotranscriptomics pattern.
Default is 
permutations 
a numeric value specifying the number of permutations to be performed for the 
parallel 
a boolean value specifying whether parallel processing (multicore processing) shall be performed. 
Details
The intention of this analysis is to validate that there exists no sequence of replicates (for all possible combination of replicates) that results in a nonsignificant pattern, when the initial pattern with combined replicates was shown to be significant.
A small Example:
Assume PhyloExpressionSet stores 3 developmental stages with 3 replicates measured for each stage. The 9 replicates in total are denoted as: 1.1, 1.2, 1.3, 2.1, 2.2, 2.3, 3.1, 3.2, 3.3. Now the function computes the statistical significance of each pattern derived by the corresponding combination of replicates, e.g.
1.1, 2.1, 3.1 > pvalue for combination 1
1.1, 2.2, 3.1 > pvalue for combination 2
1.1, 2.3, 3.1 > pvalue for combination 3
1.2, 2.1, 3.1 > pvalue for combination 4
1.2, 2.1, 3.1 > pvalue for combination 5
1.2, 2.1, 3.1 > pvalue for combination 6
1.3, 2.1, 3.1 > pvalue for combination 7
1.3, 2.2, 3.1 > pvalue for combination 8
1.3, 2.3, 3.1 > pvalue for combination 9

...
This procedure yields 27 pvalues for the 3^3 (n_stages^n_replicates) replicate combinations.
Note, that in case you have a large amount of stages/experiments and a large amount of replicates the computation time will increase by n_stages^n_replicates. For 11 stages and 4 replicates, 4^11 = 4194304 pvalues have to be computed. Each pvalue computation itself is based on a permutation test running with 1000 or more permutations. Be aware that this might take some time.
The pvalue vector returned by this function can then be used to plot the pvalues to see whether an critical value α is exeeded or not (e.g. α = 0.05).
The function receives a standard PhyloExpressionSet or DivergenceExpressionSet object and a vector storing the number of replicates present in each stage or experiment. Based on these arguments the function computes all possible replicate combinations using the expand.grid
function and performs a permutation test (either a FlatLineTest
for each replicate combination. The permutation parameter of this function specifies the number of permutations that shall be performed for each permutation test. When all pvalues are computed, a numeric vector storing the corresponding pvalues for each replicate combination is returned.
In other words, for each replicate combination present in the PhyloExpressionSet or DivergenceExpressionSet object, the TAI or TDI pattern of the corresponding replicate combination is tested for its statistical significance based on the underlying test statistic.
This function is also able to perform all computations in parallel using multicore processing. The underlying statistical tests are written in C++ and optimized for fast computations.
Value
a numeric vector storing the pvalues returned by the underlying test statistic for all possible replicate combinations.
Author(s)
HajkGeorg Drost
References
Drost HG et al. (2015). Evidence for Active Maintenance of Phylotranscriptomic Hourglass Patterns in Animal and Plant Embryogenesis. Mol Biol Evol. 32 (5): 12211231 doi:10.1093/molbev/msv012.
See Also
expand.grid
, FlatLineTest
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14  # load a standard PhyloExpressionSet
data(PhyloExpressionSetExample)
# we assume that the PhyloExpressionSetExample
# consists of 3 developmental stages
# and 2 replicates for stage 1, 3 replicates for stage 2,
# and 2 replicates for stage 3
# FOR REAL ANALYSES PLEASE USE: permutations = 1000 or 10000
# BUT NOTE THAT THIS TAKES MUCH MORE COMPUTATION TIME
p.vector < CombinatorialSignificance(ExpressionSet = PhyloExpressionSetExample,
replicates = c(2,3,2),
TestStatistic = "FlatLineTest",
permutations = 10,
parallel = FALSE)
