minimalTests | R Documentation |
Compute a p-value for each cluster based around the rejection of a minimal number or proportion of tests from that cluster.
minimalTests(
ids,
tab,
min.sig.n = 3,
min.sig.prop = 0.4,
weights = NULL,
pval.col = NULL,
fc.col = NULL,
fc.threshold = 0.05
)
ids |
An integer vector or factor containing the cluster ID for each test. |
tab |
A data.frame of results with |
min.sig.n |
Integer scalar containing the minimum number of significant barcodes when |
min.sig.prop |
Numeric scalar containing the minimum proportion of significant barcodes when |
weights |
A numeric vector of weights for each test. Defaults to 1 for all tests. |
pval.col |
An integer scalar or string specifying the column of |
fc.col |
An integer or character vector specifying the columns of |
fc.threshold |
A numeric scalar specifying the FDR threshold to use within each cluster for counting tests changing in each direction, see |
All tests with the same value of ids
are used to define a single cluster.
For each cluster, this function applies the Holm-Bonferroni correction to the p-values from all of its tests.
It then chooses the x
th-smallest adjusted p-value as the cluster-level p-value,
where x
is defined from the larger of min.sig.n
and the product of min.sig.prop
and the number of tests.
(If x
is larger than the total number of tests, the largest per-test p-value is used instead.)
Here, a cluster can only achieve a low p-value if at least x
tests also have low p-values.
This favors clusters that exhibit consistent changes across all tests,
which is useful for detecting, e.g., systematic increases in binding across a broad genomic region spanning many windows.
By comparison, combineTests
will detect a strong change in a small subinterval of a large region,
which may not be of interest in some circumstances.
The importance of each test within a cluster can be adjusted by supplying different relative weights
values.
This may be useful for downweighting low-confidence tests, e.g., those in repeat regions.
In the weighted Holm procedure, weights are used to downscale the per-test p-values,
effectively adjusting the distribution of per-test errors that contribute to family-wise errors.
Note that these weights have no effect between clusters.
To obtain ids
, a simple clustering approach for genomic windows is implemented in mergeWindows
.
However, anything can be used so long as it is independent of the p-values and does not compromise type I error control, e.g., promoters, gene bodies, independently called peaks.
Any tests with NA
values for ids
will be ignored.
A DataFrame with one row per cluster and various fields:
An integer field num.tests
, specifying the total number of tests in each cluster.
Two integer fields num.up.*
and num.down.*
for each log-FC column in tab
, containing the number of tests with log-FCs significantly greater or less than 0, respectively.
See ?"cluster-direction"
for more details.
A numeric field containing the cluster-level p-value.
If pval.col=NULL
, this column is named PValue
, otherwise its name is set to colnames(tab[,pval.col])
.
A numeric field FDR
, containing the BH-adjusted cluster-level p-value.
A character field direction
(if fc.col
is of length 1), specifying the dominant direction of change for tests in each cluster.
See ?"cluster-direction"
for more details.
One integer field rep.test
containing the row index (for tab
) of a representative test for each cluster.
See ?"cluster-direction"
for more details.
One numeric field rep.*
for each log-FC column in tab
, containing a representative log-fold change for the differential tests in the cluster.
See ?"cluster-direction"
for more details.
Each row is named according to the ID of the corresponding cluster.
Aaron Lun
Holm S (1979). A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6, 65-70.
groupedHolmMin
, which does the heavy lifting.
combineTests
and getBestTest
, for another method of combining p-values for each cluster.
mergeWindows
, for one method of generating ids
.
glmQLFTest
, for one method of generating tab
.
ids <- round(runif(100, 1, 10))
tab <- data.frame(logFC=rnorm(100), logCPM=rnorm(100), PValue=rbeta(100, 1, 2))
minimal <- minimalTests(ids, tab)
head(minimal)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.