minimalTests: Require rejection of a minimal number of tests

Description Usage Arguments Details Value Author(s) References Examples

View source: R/minimalTests.R

Description

Compute a p-value for each cluster based around the rejection of a minimal number or proportion of tests from that cluster.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
minimalTests(
  ids,
  tab,
  min.sig.n = 3,
  min.sig.prop = 0.4,
  weights = NULL,
  pval.col = NULL,
  fc.col = NULL,
  fc.threshold = 0.05
)

Arguments

ids

An integer vector or factor containing the cluster ID for each test.

tab

A data.frame of results with PValue and at least one logFC field for each test.

min.sig.n

Integer scalar containing the minimum number of significant barcodes when method="holm-min".

min.sig.prop

Numeric scalar containing the minimum proportion of significant barcodes when method="holm-min".

weights

A numeric vector of weights for each test. Defaults to 1 for all tests.

pval.col

An integer scalar or string specifying the column of tab containing the p-values. Defaults to "PValue".

fc.col

An integer or character vector specifying the columns of tab containing the log-fold changes. Defaults to all columns in tab starting with "logFC".

fc.threshold

A numeric scalar specifying the FDR threshold to use within each cluster for counting tests changing in each direction, see ?"cluster-direction" for more details.

Details

All tests with the same value of ids are used to define a single cluster. For each cluster, this function applies the Holm-Bonferroni correction to the p-values from all of its tests. It then chooses the xth-smallest adjusted p-value as the cluster-level p-value, where x is defined from the larger of min.sig.n and the product of min.sig.prop and the number of tests. (If x is larger than the total number of tests, the largest per-test p-value is used instead.)

Here, a cluster can only achieve a low p-value if at least x tests also have low p-values. This favors clusters that exhibit consistent changes across all tests, which is useful for detecting, e.g., systematic increases in binding across a broad genomic region spanning many windows. By comparison, combineTests will detect a strong change in a small subinterval of a large region, which may not be of interest in some circumstances.

The importance of each test within a cluster can be adjusted by supplying different relative weights values. This may be useful for downweighting low-confidence tests, e.g., those in repeat regions. In the weighted Holm procedure, weights are used to downscale the per-test p-values, effectively adjusting the distribution of per-test errors that contribute to family-wise errors. Note that these weights have no effect between clusters.

To obtain ids, a simple clustering approach for genomic windows is implemented in mergeWindows. However, anything can be used so long as it is independent of the p-values and does not compromise type I error control, e.g., promoters, gene bodies, independently called peaks. Any tests with NA values for ids will be ignored.

Value

A DataFrame with one row per cluster and various fields:

Each row is named according to the ID of the corresponding cluster.

Author(s)

Aaron Lun

References

Holm S (1979). A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6, 65-70.

Examples

1
2
3
4
ids <- round(runif(100, 1, 10))
tab <- data.frame(logFC=rnorm(100), logCPM=rnorm(100), PValue=rbeta(100, 1, 2))
minimal <- minimalTests(ids, tab)
head(minimal)

csaw documentation built on Nov. 12, 2020, 2:03 a.m.