cluster-direction: Reporting cluster-level direction in 'csaw'

cluster-directionR Documentation

Reporting cluster-level direction in csaw

Description

An overview of the strategies used to obtain cluster-level summaries of the direction of change, based on the directionality information of individual tests. This is relevant to all functions that aggregate per-test statistics into a per-cluster summary, e.g., combineTests, minimalTests. It assumes that there are zero, one or many columns of log-fold changes in the data.frame of per-test statistics, typically specified using a fc.cols argument.

Counting the per-test directions

For each cluster, we will report the number of tests that are up (positive values) or down (negative) for each column of log-fold change values listed in fc.col. This provide some indication of whether the change is generally positive or negative - or both - across tests in the cluster. If a cluster contains non-negligble numbers of both up and down tests, this indicates that there may be a complex differential event within that cluster (see comments in mixedTests).

To count up/down tests, we apply a multiple testing correction to the p-values within each cluster. Only the tests with adjusted p-values no greater than fc.threshold are counted as being up or down. We can interpret this as a test of conditional significance; assuming that the cluster is interesting (i.e., contains at least one true positive), what is the distribution of the signs of the changes within that cluster? Note that this procedure has no bearing on the p-value reported for the cluster itself.

The nature of the per-test correction within each cluster varies with each function. In most cases, there is a per-test correction that naturally accompanies the per-cluster p-value:

  • For combineTests, the Benjamini-Hochberg correction is used.

  • For minimalTests, the Holm correction is used.

  • For getBestTest with by.pval=TRUE, the Holm correction is used. We could also use the Bonferroni correction here but Holm is universally more powerful so we use that instead.

  • For getBestTest with by.pval=FALSE, all tests bar the one with the highest abundance are simply ignored, which mimics the application of an independent filter. No correction is applied as only one test remains.

  • For mixedTests and empiricalFDR, the Benjamini-Hochberg correction is used, given that both functions just call combineTests on the one-sided p-values in each direction. Here, the number of up tests is obtained using the one-sided p-values for a positive change; similarly, the number of down tests is obtained using the one-sided p-values for a negative change.

Representative tests and their log-fold changes

For each combining procedure, we identify a representative test for the entire cluster. This is based on the observation that, in each method, there is often one test that is especially important for computing the cluster-level p-value.

  • For combineTests, the representative is the test with the lowest BH-adjusted p-value before enforcing monotonicity. This is because the p-value for this test is directly used as the combined p-value in Simes' method.

  • For minimalTests, the test with the xth-smallest p-value is used as the representative. See the function's documentation for the definition of x.

  • For getBestTest with by.pval=TRUE, the test with the lowest p-value is used.

  • For getBestTest with by.pval=FALSE, the test with the highest abundance is used.

  • For mixedTests, two representative tests are reported in each direction. The representative test in each direction is defined using combineTests as described above.

  • For empiricalFDR, the test is chosen in the same manner as described for combineTests after converting all p-values to their one-sided counterparts in the “desirable” direction, i.e., up tests when neg.down=TRUE and down tests otherwise.

The index of the associated test is reported in the output as the "rep.test" field along with its log-fold changes. For clusters with simple differences, the log-fold change for the representative is a good summary of the effect size for the cluster.

Determining the cluster-level direction

When only one log-fold change column is specified, we will try to determine which direction contributes to the combined p-value. This is done by tallying the directions of all tests with (weighted) p-values below that of the representative test. If all tests in a cluster have positive or negative log-fold changes, that cluster's direction is reported as "up" or "down" respectively; otherwise it is reported as "mixed". This is stored as the "direction" field in the returned data frame.

Assessing the contribution of per-test p-values to the cluster-level p-value is roughly equivalent to asking whether the latter would increase if all tests in one direction were assigned p-values of unity. If there is an increase, then tests changing in that direction must contribute to the combined p-value calculations. In this manner, clusters are labelled based on whether their combined p-values are driven by tests with only positive, negative or mixed log-fold changes. (Note that this interpretation is not completely correct for minimalTests due to equality effects from enforcing monotonicity in the Holm procedure, but this is of little practical consequence.)

Users should keep in mind that the label only describes the direction of change among the most significant tests in the cluster. Clusters with complex differences may still be labelled as changing in only one direction, if the tests changing in one direction have much lower p-values than the tests changing in the other direction (even if both sets of p-values are significant). More rigorous checks for mixed changes should be performed with mixedTests.

There are several functions for which the "direction" is set to a constant value:

  • For mixedTests, it is simply set to "mixed" for all clusters. This reflects the fact that the reported p-value represents the evidence for mixed directionality in this function; indeed, the field itself is simply reported for consistency, given that we already know we are looking for mixed clusters!

  • For empiricalFDR, it is set to "up" when neg.down=FALSE and "down" otherwise. This reflects the fact that the empirical FDR reflects the significance of changes in the desired direction.

Author(s)

Aaron Lun

See Also

combineTests, minimalTests, getBestTest, empiricalFDR annd mixedTests for the functions that do the work.


LTLA/csaw documentation built on Dec. 11, 2023, 5:11 a.m.