clusterWindows: Cluster DB windows into clusters

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/clusterWindows.R

Description

Clusters significant windows into clusters while controlling the cluster-level FDR.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
clusterWindows(
  ranges,
  tab,
  target,
  pval.col = NULL,
  fc.col = NULL,
  signs = FALSE,
  tol,
  ...,
  weights = NULL,
  grid.length = 21,
  iterations = 4
)

Arguments

ranges

A GRanges or RangedSummarizedExperiment object containing window coordinates.

tab

A data.frame of results with a PValue field for each window.

target

A numeric scalar indicating the desired cluster-level FDR.

pval.col

A string or integer scalar specifying the column of tab with the p-values. Defaults to "PValue".

fc.col

A string or integer scalar specifying the column of tab with the log-fold changes. Defaults to "logFC".

signs

A logical scalar indicating whether the sign of the log-fold change (specified by fc.col) should be used in mergeWindows.

tol, ...

Arguments to be passed to mergeWindows.

weights, grid.length, iterations

Arguments to be passed to controlClusterFDR.

Details

In this function, windows are identified as significantly DB based on the BH-adjusted p-values in tab. Only those windows are then used directly for clustering via mergeWindows, which subsequently yields DB regions consisting solely of DB windows. If tol is not specified, it is set to 100 bp by default and a warning is raised. If fc.col is used to specify the column of log-fold changes, clusters are formed according to the sign of the log-fold change in mergeWindows.

DB-based clustering is obviously not blind to the DB status, so standard methods for FDR control are not valid. Instead, post-hoc control of the cluster-level FDR is applied by using controlClusterFDR, which attempts to control the cluster-level FDR at target (which is set to 0.05 if not specified). Our aim here is to provide some interpretable results when DB-blind clustering is not appropriate, e.g., for diffuse marks involving long stretches of the genome. Reporting each marked stretch in its entirety would be cumbersome, so this method allows the DB subintervals to be identified directly.

The output stats DataFrame is generated by running combineTests on the ids and tab for only the significant windows. Here, the fc.threshold argument is set to the p-value threshold used to identify significant windows. We also remove the FDR field from the output as this has little meaning when the clusters are not blind to the clustering. Indeed, the p-value is only retained for purposes of ranking.

Value

A named list containing:

Author(s)

Aaron Lun

See Also

mergeWindows, controlClusterFDR

Examples

1
2
3
4
5
6
7
set.seed(10)
x <- round(runif(100, 100, 1000))
gr <- GRanges("chrA", IRanges(x, x+5))
tab <- data.frame(PValue=rbeta(length(x), 1, 50), logFC=rnorm(length(x)))

clusterWindows(gr, tab, target=0.05, tol=10)
clusterWindows(gr, tab, target=0.05, tol=10, fc.col="logFC")

csaw documentation built on Nov. 12, 2020, 2:03 a.m.