filterPeaks: Filter bin pairs for likely peaks
In diffHic: Differential Analyis of Hi-C Data

Description Usage Arguments Details Value Author(s) See Also Examples

Identify bin pairs that are likely to represent punctate peaks in the interaction space.

1 2	filterPeaks(data, enrichment, assay.bp=1, assay.neighbors=NULL, get.enrich=FALSE, min.enrich=log2(1.5), min.count=5, min.diag=2L, ...)

`data`	an InteractionSet object produced by `enrichedPairs` or `neighborCounts`
`enrichment`	a numeric vector of enrichment values
`assay.bp`	a string or integer scalar specifying the assay containing bin pair counts
`assay.neighbors`	a character vector containing names for the neighborhood regions, see `enrichedPairs` for details
`get.enrich`	a logical scalar indicating whether enrichment values should be returned
`min.enrich`	a numeric scalar indicating the minimum enrichment score for a peak
`min.count`	a numeric scalar indicating the minimum average count for a peak
`min.diag`	an integer scalar specifying the minimum diagonal in the interaction space with which to consider a peak
`...`	other arguments to be passed to `aveLogCPM` for the average count filter

Filtering on the local enrichment scores identifies high-intensity islands in the interaction space. However, this alone is not sufficient to identify sensible peaks. Filtering on the absolute average counts prevents the calling of low-abundance bin pairs with high enrichment scores due to empty neighborhoods. Filtering on the diagonals prevents calling of high-abundance short-range interactions that are usually uninteresting. If either min.count or min.diag are NULL, no filtering will be performed on the average counts and diagonals, respectively.

To compute enrichment values, we assume that the number of read pairs in neighborhood areas have been counted using enrichedPairs or neighborCounts. For a given bin pair in data, this function computes the mean abundance across libraries for each surrounding neighborhood, scaled by the neighborhood area (i.e., the number of bin pairs it contains). The local background for the target bin pair is defined as the maximum of the mean abundances for all neighborhoods. The enrichment value is then defined as the the difference between the target bin pair's abundance and its local background. The idea is that bin pairs with high enrichments are likely to represent punctate interactions between clearly defined loci. Selecting for high enrichments can then select for these peak-like features in the interaction space.

The maximizing strategy is designed to mitigate the effects of structural features. Region B will capture the high interaction intensity within genomic domains like TADs, while the C and D will capture any bands in the interaction space. The abundance will be high for any neighborhood that captures a high-intensity feature, as the average counts will be large for all bin pairs within the features. This will then be chosen as the maximum during calculation of enrichment values. Otherwise, if only region A were used, the background abundance would be decreased by low-intensity bin pairs outside of the features. This results in spuriously high enrichment values for target bin pairs on the feature boundaries.

If get.enrich=TRUE, a numeric vector of enrichment values for each bin pair. Otherwise, a logical vector indicating whether or not each bin pair is to be considered as a peak.

Aaron Lun

squareCounts, enrichedPairs, neighborCounts

# Setting up the object.
a <- 10
b <- 20
regions <- GRanges(rep(c("chrA", "chrB"), c(a, b)), IRanges(c(1:a, 1:b), c(1:a, 1:b)))

set.seed(23943)
all.anchor1 <- sample(length(regions), 50, replace=TRUE)
all.anchor2 <- as.integer(runif(50, 1, all.anchor1+1))
data <- InteractionSet(
    list(counts=matrix(as.integer(rnbinom(200, mu=10, size=10)), 50, 4)), 
    GInteractions(anchor1=all.anchor1, anchor2=all.anchor2, regions=regions, mode="reverse"), 
    colData=DataFrame(totals=runif(4, 1e6, 2e6)), metadata=List(width=1))

# Getting peaks.
enrichment <- enrichedPairs(data)
summary(filterPeaks(enrichment, min.enrich=0.5))
summary(filterPeaks(enrichment, min.enrich=0.5, min.count=10))
summary(filterPeaks(enrichment, min.enrich=0.5, min.diag=NULL))