Description Usage Arguments Value Author(s) References Examples
ccrepe calculates compositionality-corrected p-values and q-values for compositional data by generating
first a null distribution of the distance metric generated by permutation and renormalization of the data,
and then by generating an alternative distribution of the distance metric by bootstrap resampling of the data.
For greater detail, see References
The two distributions are compared using a pooled-variance Z-test to give a compositionality-corrected p-value.
The p-values can be calculated for all appropriate (passing certain quality-control measures) pairwise comparisons,
or for a subset of user-specified ones.
Q-values are additionally calculated using the Benjamin-Hochberg-Yekutieli procedure (see References)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
x |
First dataframe or matrix containing the relative abundances in cavity1 : columns are bugs, rows are samples.
(Rows should therefore sum to a constant.) |
y |
Second dataframe or matrix (optional) containing the relative abundances in cavity2: columns are bugs, rows are samples. |
sim.score |
A function defining a similarity measure, such as cor or nc.score. This similarity measure can be a pre-defined R function or user-defined. If the latter,
certain properties should be satisfied as detailed below (also see examples). The default similarity measure is Spearman correlation. |
sim.score.args |
A list of arguments for the measurement function.
For example: In the case of cor, the following would be acceptable:
sim.score.args = list(method='spearman',use='complete.obs' ). |
min.subj |
Minimum number of samples that must be non-missing in a bug/feature/column in order to apply the similarity measure to that bug/feature/column. This is to ensure that there are sufficient subjects to perform a bootstrap (default: 20). |
iterations |
The number of iterations of bootstrap and permutation (default: 1000). |
subset.cols.x |
A vector of column indices from x to indicate which features to compare |
subset.cols.y |
A vector of column indices from y to indicate which features to compare |
errthresh |
If feature has number of zeros greater than errthresh^(1/n) , that feature is excluded |
verbose |
Logical: an indicator whether the user requested verbose output, which prints periodic progress of the algorithm through the dataset(s), as well as including more detailed output. (default:FALSE) |
iterations.gap |
If output is verbose - number of iterations after issue a status message (Default=100 - displayed only if verbose=TRUE). |
distributions |
Output Distribution file (default:NA). |
compare.within.x |
A boolean value indicating whether to do comparisons given by taking all subsets of size 2 from subset.cols.x or to do comparisons given by taking all possible combinations of subset.cols.x or subset.cols.y. If TRUE but subset.cols.y=NA, returns all comparisons involving any features in subset.cols.x. This argument is only used when y=NA. |
concurrent.output |
Optional output file to which each comparison will be written as it is calculated. |
make.output.table |
A boolean value indicating whether to include table-formatted output. |
Returns a list containing the calculation results and the parameters used.
Default parameters shown:
min.subj |
Description above |
errThresh |
Description same as errthresh above |
sim.score |
A matrix of the simliarity scores for all the requested comparisons. The (i,j)th element of sim.score correponds to the similarity score of column i (or the ith column of subset.cols.1) and column j (or the jth column of subset.cols.1) in one dataset, or to the similarity score of column i (or the ith column of subset.cols.1) in dataset x and column j (or the jth column of subset.cols.2)in dataset y in the case of two datasets. |
p.values |
A matrix of the p-values for all the requested comparisons. The (i,j)th element of p.values corresponds to the p-value of the (i,j)th element of sim.score. |
q.values |
A matrix of the Benjamini-Hochberg-Yekutieli FDR corrected p-values. The (i,j)th element of q.values corresponds to the q-value fo the (i,j)th element of sim.score. |
z.stat |
A matrix of the z-statistics for all the requested comparisons. The (i,j)th element corresponds to the z-statistic which gave rise to the (i,j)th p-value. |
output.table |
(Only if make.output.table=TRUE) A table where each row is one comparision. Each row contains the features being compared with their similarity scores, z-statistics, p-values and q-values |
Additional parameters if verbose=TRUE:
iterations |
Description Above |
subset.cols.x |
Description Above |
subset.cols.y |
Description Above |
iterations.gap |
Description Above |
sim.score.parameters |
Description Above |
compare.within.x |
Description Above |
make.output.table |
Description Above |
Emma Schwager <emma.schwager@gmail.com>
Emma Schwager and Colleagues. Detecting statistically significant associtations between sparse and high dimensional compositioanl data. In Progress.
Benjamini and Yekutieli (2001). "The control of the false discovery rate in multiple testing under dependency." The Annals of Statistics. Vol. 19, No. 4. pp. 1165-1188.
1 2 3 4 5 6 7 8 9 | data <- matrix(rlnorm(40,meanlog=0,sdlog=1),nrow=10)
data.rowsum <- apply(data,1,sum)
data.norm <- data/data.rowsum
testdata <- data.norm
dimnames(testdata) <- list(paste("Sample",seq(1,10)),paste("Feature",seq(1,4)))
ccrepe.results <-ccrepe (x=testdata, iterations=20, min.subj=10)
ccrepe.results.nc.score <- ccrepe(x=testdata,iterations=20,min.subj=10,sim.score=nc.score)
ccrepe.results
ccrepe.results.nc.score
|
$p.values
Feature 1 Feature 2 Feature 3 Feature 4
Feature 1 NA 0.1662201 0.6517241 0.9447830
Feature 2 0.1662201 NA 0.1464977 0.1456690
Feature 3 0.6517241 0.1464977 NA 0.4326449
Feature 4 0.9447830 0.1456690 0.4326449 NA
$z.stat
Feature 1 Feature 2 Feature 3 Feature 4
Feature 1 NA -1.384452 -0.4513683 -0.06925957
Feature 2 -1.38445190 NA 1.4520142 1.45500079
Feature 3 -0.45136829 1.452014 NA -0.78467363
Feature 4 -0.06925957 1.455001 -0.7846736 NA
$sim.score
Feature 1 Feature 2 Feature 3 Feature 4
Feature 1 NA -0.8322778 -0.4219571 -0.4471813
Feature 2 -0.8322778 NA 0.1152451 0.1943005
Feature 3 -0.4219571 0.1152451 NA -0.3430894
Feature 4 -0.4471813 0.1943005 -0.3430894 NA
$q.values
Feature 1 Feature 2 Feature 3 Feature 4
Feature 1 NA 0.7875427 1.852702 2.238167
Feature 2 0.7875427 NA 1.041148 2.070518
Feature 3 1.8527019 1.0411479 NA 1.537388
Feature 4 2.2381674 2.0705180 1.537388 NA
$p.values
Feature 1 Feature 2 Feature 3 Feature 4
Feature 1 NA 0.2233601 0.2883500 0.9137071
Feature 2 0.2233601 NA 0.1872502 0.5029694
Feature 3 0.2883500 0.1872502 NA 0.9103785
Feature 4 0.9137071 0.5029694 0.9103785 NA
$z.stat
Feature 1 Feature 2 Feature 3 Feature 4
Feature 1 NA -1.217642 -1.0617482 0.1083638
Feature 2 -1.2176419 NA 1.3187574 0.6698250
Feature 3 -1.0617482 1.318757 NA 0.1125611
Feature 4 0.1083638 0.669825 0.1125611 NA
$sim.score
Feature 1 Feature 2 Feature 3 Feature 4
Feature 1 NA -0.81250 -0.65625 -0.34375
Feature 2 -0.81250 NA 0.43750 0.21875
Feature 3 -0.65625 0.43750 NA 0.18750
Feature 4 -0.34375 0.21875 0.18750 NA
$q.values
Feature 1 Feature 2 Feature 3 Feature 4
Feature 1 NA 1.587403 1.366188 2.164549
Feature 2 1.587403 NA 2.661547 1.787283
Feature 3 1.366188 2.661547 NA 2.587997
Feature 4 2.164549 1.787283 2.587997 NA
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.