Description Usage Arguments Details Value Author(s) See Also Examples
Find the test with the greatest significance or the highest abundance in each cluster.
1 2 3 4 5 6 7 8 9 10 |
ids |
An integer vector or factor containing the cluster ID for each test. |
tab |
A data.frame of results with |
by.pval |
Logical scalar indicating whether the best test should be selected on the basis of the smallest p-value.
If |
weights |
A numeric vector of weights for each test. Defaults to 1 for all tests. |
pval.col |
An integer scalar or string specifying the column of |
fc.col |
An integer or character vector specifying the columns of |
fc.threshold |
A numeric scalar specifying the FDR threshold to use within each cluster for counting tests changing in each direction, see |
cpm.col |
An integer scalar or string specifying the column of |
Each cluster is defined as a set of tests with the same value of ids
(any NA
values are ignored).
If by.pval=TRUE
, this function identifies the test with the lowest p-value as that with the strongest evidence against the null in each cluster.
The p-value of the chosen test is adjusted using the (Holm-)Bonferroni correction, based on the total number of tests in the parent cluster.
This is necessary to obtain strong control of the family-wise error rate such that the best test can be taken from each cluster for further consideration.
The importance of each window in each cluster can be adjusted by supplying different relative weights
values.
Each weight is interpreted as a different threshold for each test in the cluster using the weighted Holm procedure.
Larger weights correspond to lower thresholds, i.e., less evidence is needed to reject the null for tests deemed to be more important.
This may be useful for upweighting particular tests such as those for windows containing a motif for the TF of interest.
Note the difference between this function and combineTests
.
The latter presents evidence for any rejections within a cluster.
This function specifies the exact location of the rejection in the cluster, which may be more useful in some cases but at the cost of conservativeness.
In both cases, clustering procedures such as mergeWindows
can be used to identify the cluster.
If by.pval=FALSE
, the best test is defined as that with the highest log-CPM value.
This should be independent of the p-value so no adjustment is necessary. Weights are not applied here.
This mode may be useful when abundance is correlated to rejection under the alternative hypothesis, e.g., picking high-abundance regions that are more likely to contain peaks.
To obtain ids
, a simple clustering approach for genomic windows is implemented in mergeWindows
.
However, anything can be used so long as it is independent of the p-values and does not compromise type I error control, e.g., promoters, gene bodies, independently called peaks.
Any tests with NA
values for ids
will be ignored.
A DataFrame with one row per cluster and various fields:
An integer field num.tests
, specifying the total number of tests in each cluster.
Two integer fields num.up.*
and num.down.*
for each log-FC column in tab
, containing the number of tests with log-FCs significantly greater or less than 0, respectively.
See ?"cluster-direction"
for more details.
A numeric field containing the cluster-level p-value.
If pval.col=NULL
, this column is named PValue
, otherwise its name is set to colnames(tab[,pval.col])
.
A numeric field FDR
, containing the BH-adjusted cluster-level p-value.
A character field direction
(if fc.col
is of length 1), specifying the dominant direction of change for tests in each cluster.
See ?"cluster-direction"
for more details.
One integer field rep.test
containing the row index (for tab
) of a representative test for each cluster.
See ?"cluster-direction"
for more details.
One numeric field rep.*
for each log-FC column in tab
, containing a representative log-fold change for the differential tests in the cluster.
See ?"cluster-direction"
for more details.
Each row is named according to the ID of the corresponding cluster.
Aaron Lun
combineTests
and minimalTests
, for other methods for obtaining cluster-level p-values.
mergeWindows
, to generate ids
.
glmQLFTest
, for one method of generating tab
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | ids <- round(runif(100, 1, 10))
tab <- data.frame(logFC=rnorm(100), logCPM=rnorm(100), PValue=rbeta(100, 1, 2))
best <- getBestTest(ids, tab)
head(best)
best <- getBestTest(ids, tab, cpm.col="logCPM", pval.col="PValue")
head(best)
# With window weighting.
w <- round(runif(100, 1, 5))
best <- getBestTest(ids, tab, weight=w)
head(best)
# By logCPM.
best <- getBestTest(ids, tab, by.pval=FALSE)
head(best)
best <- getBestTest(ids, tab, by.pval=FALSE, cpm.col=2, pval.col=3)
head(best)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.