overlapStats: Combine statistics for overlap-based clusters

overlapStatsR Documentation

Combine statistics for overlap-based clusters

Description

Compute assorted statistics for overlaps between windows and pre-defined genomic regions in a Hits object.

Usage

combineOverlaps(overlaps, tab, o.weights = NULL, i.weights = NULL, ...)

getBestOverlaps(overlaps, tab, o.weights = NULL, i.weights = NULL, ...)

empiricalOverlaps(overlaps, tab, o.weights = NULL, i.weights = NULL, ...)

mixedOverlaps(overlaps, tab, o.weights = NULL, i.weights = NULL, ...)

summitOverlaps(overlaps, region.best, o.summit = NULL, i.summit = NULL)

Arguments

overlaps

A Hits object produced by findOverlaps, containing overlaps between regions (query) and windows (subject).

tab

A data.frame of (differential binding) statistics for each window.

o.weights

A numeric vector specifying weights for each overlapped window.

i.weights

A numeric vector specifying weights for each individual window.

...

Other arguments to be passed to the wrapped functions.

region.best

An integer vector specifying the window index that is the summit for each region.

o.summit

A logical vector specifying the overlapped windows that are summits, or a corresponding integer vector of indices for such windows.

i.summit

A logical vector specifying whether an individual window is a summit, or a corresponding integer vector of indices.

Details

These functions provide convenient wrappers around combineTests, getBestTest, empiricalFDR, mixedClusters and upweightSummit for handling overlaps between windows and arbitrary pre-specified regions. They accept Hits objects produced by running findOverlaps between regions (as the query) and windows (as the subject). Each set of windows overlapping a region is defined as a cluster to compute various statistics.

A wrapper is necessary as a window may overlap multiple regions. If so, the multiple instances of that window are defined as distinct “overlapped” windows, where each overlapped window is assigned to a different region. Each overlapped window is represented by a separate entry of overlaps. In contrast, the “individual” window just refers to the window itself, regardless of what it overlaps. This is represented by each row of the RangedSummarizedExperiment object and the tab derived from it.

The distinction between these two definitions is required to describe the weight arguments. The o.weights argument refers to the weights for each region-window relationship. This allows for different weights to be assigned to the same window in different regions. The i.weights argument is the weight of the window itself, and is the same regardless of the region. If both are specified, o.weights takes precedence.

For summitOverlaps, the region.best argument is designed to accept the rep.test field in the output of getBestOverlaps (run with by.pval=FALSE). This contains the index for the individual window that is the summit within each region. In contrast, the i.summit argument indicates whether an individual window is a summit, e.g., from findMaxima. The o.summit argument does the same for overlapped windows, though this has no obvious input within the csaw pipeline.

Value

For combineOverlaps, getBestOverlaps, empiricalOverlaps and mixedOverlaps, a DataFrame is returned from their respective wrapped functions. Each row of the DataFrame corresponds to a region, where regions without overlapped windows are assigned NA values.

For summitOverlaps, a numeric vector of weights is produced. This can be used as o.weight in the other two functions.

Author(s)

Aaron Lun

See Also

combineTests, getBestTest, empiricalFDR and upweightSummit, for the underlying functions.

findOverlaps, to generate the required Hits object.

Examples

bamFiles <- system.file("exdata", c("rep1.bam", "rep2.bam"), package="csaw")
data <- windowCounts(bamFiles, width=1, filter=1)
of.interest <- GRanges(c('chrA', 'chrA', 'chrB', 'chrC'), 
    IRanges(c(1, 500, 100, 1000), c(200, 1000, 700, 1500)))

# Making some mock results.
N <- nrow(data)
mock <- data.frame(logFC=rnorm(N), PValue=runif(N), logCPM=rnorm(N))

olap <- findOverlaps(of.interest, rowRanges(data))
combineOverlaps(olap, mock)
getBestOverlaps(olap, mock)
empiricalOverlaps(olap, mock)

# See what happens when you don't get many overlaps.
getBestOverlaps(olap[1,], mock)
combineOverlaps(olap[2,], mock)
empiricalOverlaps(olap[1,], mock)

# Weighting example, with window-specific weights.
window.weights <- runif(N) 
comb <- combineOverlaps(olap, mock, i.weight=window.weights)
comb <- getBestOverlaps(olap, mock, i.weight=window.weights)
comb <- empiricalOverlaps(olap, mock, i.weight=window.weights)

# Weighting example, with relation-specific weights.
best.by.ave <- getBestOverlaps(olap, mock, by.pval=FALSE)
w <- summitOverlaps(olap, region.best=best.by.ave$rep.test)
head(w)
stopifnot(length(w)==length(olap))
combineOverlaps(olap, mock, o.weight=w)

# Running summitOverlaps for window-specific summits
# (output is still relation-specific weights, though).
is.summit <- findMaxima(rowRanges(data), range=100, metric=mock$logCPM)
w <- summitOverlaps(olap, i.summit=is.summit)
head(w)


LTLA/csaw documentation built on Dec. 21, 2024, 1:10 a.m.