minimalTests: Require rejection of a minimal number of tests

View source: R/minimalTests.R

minimalTestsR Documentation

Require rejection of a minimal number of tests

Description

Compute a p-value for each cluster based around the rejection of a minimal number or proportion of tests from that cluster.

Usage

minimalTests(
  ids,
  tab,
  min.sig.n = 3,
  min.sig.prop = 0.4,
  weights = NULL,
  pval.col = NULL,
  fc.col = NULL,
  fc.threshold = 0.05
)

Arguments

ids

An integer vector or factor containing the cluster ID for each test.

tab

A data.frame of results with PValue and at least one logFC field for each test.

min.sig.n

Integer scalar containing the minimum number of significant barcodes when method="holm-min".

min.sig.prop

Numeric scalar containing the minimum proportion of significant barcodes when method="holm-min".

weights

A numeric vector of weights for each test. Defaults to 1 for all tests.

pval.col

An integer scalar or string specifying the column of tab containing the p-values. Defaults to "PValue".

fc.col

An integer or character vector specifying the columns of tab containing the log-fold changes. Defaults to all columns in tab starting with "logFC".

fc.threshold

A numeric scalar specifying the FDR threshold to use within each cluster for counting tests changing in each direction, see ?"cluster-direction" for more details.

Details

All tests with the same value of ids are used to define a single cluster. For each cluster, this function applies the Holm-Bonferroni correction to the p-values from all of its tests. It then chooses the xth-smallest adjusted p-value as the cluster-level p-value, where x is defined from the larger of min.sig.n and the product of min.sig.prop and the number of tests. (If x is larger than the total number of tests, the largest per-test p-value is used instead.)

Here, a cluster can only achieve a low p-value if at least x tests also have low p-values. This favors clusters that exhibit consistent changes across all tests, which is useful for detecting, e.g., systematic increases in binding across a broad genomic region spanning many windows. By comparison, combineTests will detect a strong change in a small subinterval of a large region, which may not be of interest in some circumstances.

The importance of each test within a cluster can be adjusted by supplying different relative weights values. This may be useful for downweighting low-confidence tests, e.g., those in repeat regions. In the weighted Holm procedure, weights are used to downscale the per-test p-values, effectively adjusting the distribution of per-test errors that contribute to family-wise errors. Note that these weights have no effect between clusters.

To obtain ids, a simple clustering approach for genomic windows is implemented in mergeWindows. However, anything can be used so long as it is independent of the p-values and does not compromise type I error control, e.g., promoters, gene bodies, independently called peaks. Any tests with NA values for ids will be ignored.

Value

A DataFrame with one row per cluster and various fields:

  • An integer field num.tests, specifying the total number of tests in each cluster.

  • Two integer fields num.up.* and num.down.* for each log-FC column in tab, containing the number of tests with log-FCs significantly greater or less than 0, respectively. See ?"cluster-direction" for more details.

  • A numeric field containing the cluster-level p-value. If pval.col=NULL, this column is named PValue, otherwise its name is set to colnames(tab[,pval.col]).

  • A numeric field FDR, containing the BH-adjusted cluster-level p-value.

  • A character field direction (if fc.col is of length 1), specifying the dominant direction of change for tests in each cluster. See ?"cluster-direction" for more details.

  • One integer field rep.test containing the row index (for tab) of a representative test for each cluster. See ?"cluster-direction" for more details.

  • One numeric field rep.* for each log-FC column in tab, containing a representative log-fold change for the differential tests in the cluster. See ?"cluster-direction" for more details.

Each row is named according to the ID of the corresponding cluster.

Author(s)

Aaron Lun

References

Holm S (1979). A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6, 65-70.

See Also

groupedHolmMin, which does the heavy lifting.

combineTests and getBestTest, for another method of combining p-values for each cluster.

mergeWindows, for one method of generating ids.

glmQLFTest, for one method of generating tab.

Examples

ids <- round(runif(100, 1, 10))
tab <- data.frame(logFC=rnorm(100), logCPM=rnorm(100), PValue=rbeta(100, 1, 2))
minimal <- minimalTests(ids, tab)
head(minimal)


LTLA/csaw documentation built on Dec. 11, 2023, 5:11 a.m.