View source: R/clusterWindows.R
clusterWindows | R Documentation |
Clusters significant windows into clusters while controlling the cluster-level FDR.
clusterWindows(
ranges,
tab,
target,
pval.col = NULL,
fc.col = NULL,
signs = FALSE,
tol,
...,
weights = NULL,
grid.length = 21,
iterations = 4
)
ranges |
A GRanges or RangedSummarizedExperiment object containing window coordinates. |
tab |
A data.frame of results with a |
target |
A numeric scalar indicating the desired cluster-level FDR. |
pval.col |
A string or integer scalar specifying the column of |
fc.col |
A string or integer scalar specifying the column of |
signs |
A logical scalar indicating whether the sign of the log-fold change (specified by |
tol, ... |
Arguments to be passed to |
weights, grid.length, iterations |
Arguments to be passed to |
In this function, windows are identified as significantly DB based on the BH-adjusted p-values in tab
.
Only those windows are then used directly for clustering via mergeWindows
,
which subsequently yields DB regions consisting solely of DB windows.
If tol
is not specified, it is set to 100 bp by default and a warning is raised.
If fc.col
is used to specify the column of log-fold changes, clusters are formed according to the sign of the log-fold change in mergeWindows
.
DB-based clustering is obviously not blind to the DB status, so standard methods for FDR control are not valid.
Instead, post-hoc control of the cluster-level FDR is applied by using controlClusterFDR
,
which attempts to control the cluster-level FDR at target
(which is set to 0.05 if not specified).
Our aim here is to provide some interpretable results when DB-blind clustering is not appropriate, e.g., for diffuse marks involving long stretches of the genome.
Reporting each marked stretch in its entirety would be cumbersome, so this method allows the DB subintervals to be identified directly.
The output stats
DataFrame is generated by running combineTests
on the ids
and tab
for only the significant windows.
Here, the fc.threshold
argument is set to the p-value threshold used to identify significant windows.
We also remove the FDR
field from the output as this has little meaning when the clusters are not blind to the clustering.
Indeed, the p-value is only retained for purposes of ranking.
A named list containing:
ids
, an integer vector of cluster IDs for each window in ranges
.
Non-significant windows that were not used to construct regions
are marked with NA
values.
regions
, a GRanges containing the coordinates for each cluster.
FDR
, a numeric scalar containing the estimate of the cluster-level FDR for the returned regions.
threshold
, a numeric scalar containing the per-window FDR threshold used to identify significant windows.
stats
, a DataFrame containing some descriptive per-cluster statistics.
Aaron Lun
mergeWindows
,
controlClusterFDR
set.seed(10)
x <- round(runif(100, 100, 1000))
gr <- GRanges("chrA", IRanges(x, x+5))
tab <- data.frame(PValue=rbeta(length(x), 1, 50), logFC=rnorm(length(x)))
clusterWindows(gr, tab, target=0.05, tol=10)
clusterWindows(gr, tab, target=0.05, tol=10, fc.col="logFC")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.