combineTests | R Documentation |
Combines p-values across clustered tests using Simes' method to control the cluster-level FDR.
combineTests(
ids,
tab,
weights = NULL,
pval.col = NULL,
fc.col = NULL,
fc.threshold = 0.05
)
ids |
An integer vector or factor containing the cluster ID for each test. |
tab |
A data.frame of results with |
weights |
A numeric vector of weights for each test. Defaults to 1 for all tests. |
pval.col |
An integer scalar or string specifying the column of |
fc.col |
An integer or character vector specifying the columns of |
fc.threshold |
A numeric scalar specifying the FDR threshold to use within each cluster for counting tests changing in each direction, see |
All tests with the same value of ids
are used to define a single cluster.
This function applies Simes' procedure to the per-test p-values to compute the combined p-value for each cluster,
which represents evidence against the global null hypothesis, i.e., all individual nulls are true in each cluster.
The BH method is then applied to control the FDR across all clusters.
Rejection of the global null is more relevant than the significance of each individual test when multiple tests in a cluster represent parts of the same underlying event, e.g., differentially bound genomic regions consisting of clusters of windows. Control of the FDR across tests may not equate to control of the FDR across clusters; we ensure the latter by explicitly computing cluster-level p-values for use in the BH method.
We use Simes' method as it is relatively relaxed and rejects the global null upon observing any change in the cluster.
More stringent methods are available in functions like minimalTests
and getBestTest
.
The importance of each test within a cluster can be adjusted by supplying different relative weights
values.
This may be useful for downweighting low-confidence tests, e.g., those in repeat regions.
In Simes' procedure, weights are interpreted as relative frequencies of the tests in each cluster.
Note that these weights have no effect between clusters.
To obtain ids
, a simple clustering approach for genomic windows is implemented in mergeWindows
.
However, anything can be used so long as it is independent of the p-values and does not compromise type I error control, e.g., promoters, gene bodies, independently called peaks.
Any tests with NA
values for ids
will be ignored.
A DataFrame with one row per cluster and various fields:
An integer field num.tests
, specifying the total number of tests in each cluster.
Two integer fields num.up.*
and num.down.*
for each log-FC column in tab
, containing the number of tests with log-FCs significantly greater or less than 0, respectively.
See ?"cluster-direction"
for more details.
A numeric field containing the cluster-level p-value.
If pval.col=NULL
, this column is named PValue
, otherwise its name is set to colnames(tab[,pval.col])
.
A numeric field FDR
, containing the BH-adjusted cluster-level p-value.
A character field direction
(if fc.col
is of length 1), specifying the dominant direction of change for tests in each cluster.
See ?"cluster-direction"
for more details.
One integer field rep.test
containing the row index (for tab
) of a representative test for each cluster.
See ?"cluster-direction"
for more details.
One numeric field rep.*
for each log-FC column in tab
, containing a representative log-fold change for the differential tests in the cluster.
See ?"cluster-direction"
for more details.
Each row is named according to the ID of the corresponding cluster.
Aaron Lun
Simes RJ (1986). An improved Bonferroni procedure for multiple tests of significance. Biometrika 73, 751-754.
Benjamini Y and Hochberg Y (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Series B 57, 289-300.
Benjamini Y and Hochberg Y (1997). Multiple hypotheses testing with weights. Scand. J. Stat. 24, 407-418.
Lun ATL and Smyth GK (2014). De novo detection of differentially bound regions for ChIP-seq data using peaks and windows: controlling error rates correctly. Nucleic Acids Res. 42, e95
groupedSimes
, which does the heavy lifting.
minimalTests
and getBestTest
, for another method of combining p-values for each cluster.
mergeWindows
, for one method of generating ids
.
glmQLFTest
, for one method of generating tab
.
ids <- round(runif(100, 1, 10))
tab <- data.frame(logFC=rnorm(100), logCPM=rnorm(100), PValue=rbeta(100, 1, 2))
combined <- combineTests(ids, tab)
head(combined)
# With window weighting.
w <- round(runif(100, 1, 5))
combined <- combineTests(ids, tab, weights=w)
head(combined)
# With multiple log-FCs.
tab$logFC.whee <- rnorm(100, 5)
combined <- combineTests(ids, tab)
head(combined)
# Manual specification of column IDs.
combined <- combineTests(ids, tab, fc.col=c(1,4), pval.col=3)
head(combined)
combined <- combineTests(ids, tab, fc.col="logFC.whee", pval.col="PValue")
head(combined)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.