clusterWindows: Cluster DB windows into clusters

View source: R/clusterWindows.R

clusterWindowsR Documentation

Cluster DB windows into clusters

Description

Clusters significant windows into clusters while controlling the cluster-level FDR.

Usage

clusterWindows(
  ranges,
  tab,
  target,
  pval.col = NULL,
  fc.col = NULL,
  signs = FALSE,
  tol,
  ...,
  weights = NULL,
  grid.length = 21,
  iterations = 4
)

Arguments

ranges

A GRanges or RangedSummarizedExperiment object containing window coordinates.

tab

A data.frame of results with a PValue field for each window.

target

A numeric scalar indicating the desired cluster-level FDR.

pval.col

A string or integer scalar specifying the column of tab with the p-values. Defaults to "PValue".

fc.col

A string or integer scalar specifying the column of tab with the log-fold changes. Defaults to "logFC".

signs

A logical scalar indicating whether the sign of the log-fold change (specified by fc.col) should be used in mergeWindows.

tol, ...

Arguments to be passed to mergeWindows.

weights, grid.length, iterations

Arguments to be passed to controlClusterFDR.

Details

In this function, windows are identified as significantly DB based on the BH-adjusted p-values in tab. Only those windows are then used directly for clustering via mergeWindows, which subsequently yields DB regions consisting solely of DB windows. If tol is not specified, it is set to 100 bp by default and a warning is raised. If fc.col is used to specify the column of log-fold changes, clusters are formed according to the sign of the log-fold change in mergeWindows.

DB-based clustering is obviously not blind to the DB status, so standard methods for FDR control are not valid. Instead, post-hoc control of the cluster-level FDR is applied by using controlClusterFDR, which attempts to control the cluster-level FDR at target (which is set to 0.05 if not specified). Our aim here is to provide some interpretable results when DB-blind clustering is not appropriate, e.g., for diffuse marks involving long stretches of the genome. Reporting each marked stretch in its entirety would be cumbersome, so this method allows the DB subintervals to be identified directly.

The output stats DataFrame is generated by running combineTests on the ids and tab for only the significant windows. Here, the fc.threshold argument is set to the p-value threshold used to identify significant windows. We also remove the FDR field from the output as this has little meaning when the clusters are not blind to the clustering. Indeed, the p-value is only retained for purposes of ranking.

Value

A named list containing:

  • ids, an integer vector of cluster IDs for each window in ranges. Non-significant windows that were not used to construct regions are marked with NA values.

  • regions, a GRanges containing the coordinates for each cluster.

  • FDR, a numeric scalar containing the estimate of the cluster-level FDR for the returned regions.

  • threshold, a numeric scalar containing the per-window FDR threshold used to identify significant windows.

  • stats, a DataFrame containing some descriptive per-cluster statistics.

Author(s)

Aaron Lun

See Also

mergeWindows, controlClusterFDR

Examples

set.seed(10)
x <- round(runif(100, 100, 1000))
gr <- GRanges("chrA", IRanges(x, x+5))
tab <- data.frame(PValue=rbeta(length(x), 1, 50), logFC=rnorm(length(x)))

clusterWindows(gr, tab, target=0.05, tol=10)
clusterWindows(gr, tab, target=0.05, tol=10, fc.col="logFC")


LTLA/csaw documentation built on Dec. 11, 2023, 5:11 a.m.