filterPeaks | R Documentation |
Identify bin pairs that are likely to represent punctate peaks in the interaction space.
filterPeaks(data, enrichment, assay.bp=1, assay.neighbors=NULL, get.enrich=FALSE,
min.enrich=log2(1.5), min.count=5, min.diag=2L, ...)
data |
an InteractionSet object produced by |
enrichment |
a numeric vector of enrichment values |
assay.bp |
a string or integer scalar specifying the assay containing bin pair counts |
assay.neighbors |
a character vector containing names for the neighborhood regions, see |
get.enrich |
a logical scalar indicating whether enrichment values should be returned |
min.enrich |
a numeric scalar indicating the minimum enrichment score for a peak |
min.count |
a numeric scalar indicating the minimum average count for a peak |
min.diag |
an integer scalar specifying the minimum diagonal in the interaction space with which to consider a peak |
... |
other arguments to be passed to |
Filtering on the local enrichment scores identifies high-intensity islands in the interaction space.
However, this alone is not sufficient to identify sensible peaks.
Filtering on the absolute average counts prevents the calling of low-abundance bin pairs with high enrichment scores due to empty neighborhoods.
Filtering on the diagonals prevents calling of high-abundance short-range interactions that are usually uninteresting.
If either min.count
or min.diag
are NULL
, no filtering will be performed on the average counts and diagonals, respectively.
To compute enrichment values, we assume that the number of read pairs in neighborhood areas have been counted using enrichedPairs
or neighborCounts
.
For a given bin pair in data
, this function computes the mean abundance across libraries for each surrounding neighborhood, scaled by the neighborhood area (i.e., the number of bin pairs it contains).
The local background for the target bin pair is defined as the maximum of the mean abundances for all neighborhoods.
The enrichment value is then defined as the the difference between the target bin pair's abundance and its local background.
The idea is that bin pairs with high enrichments are likely to represent punctate interactions between clearly defined loci.
Selecting for high enrichments can then select for these peak-like features in the interaction space.
The maximizing strategy is designed to mitigate the effects of structural features. Region B will capture the high interaction intensity within genomic domains like TADs, while the C and D will capture any bands in the interaction space. The abundance will be high for any neighborhood that captures a high-intensity feature, as the average counts will be large for all bin pairs within the features. This will then be chosen as the maximum during calculation of enrichment values. Otherwise, if only region A were used, the background abundance would be decreased by low-intensity bin pairs outside of the features. This results in spuriously high enrichment values for target bin pairs on the feature boundaries.
If get.enrich=TRUE
, a numeric vector of enrichment values for each bin pair.
Otherwise, a logical vector indicating whether or not each bin pair is to be considered as a peak.
Aaron Lun
squareCounts
,
enrichedPairs
,
neighborCounts
# Setting up the object.
a <- 10
b <- 20
regions <- GRanges(rep(c("chrA", "chrB"), c(a, b)), IRanges(c(1:a, 1:b), c(1:a, 1:b)))
set.seed(23943)
all.anchor1 <- sample(length(regions), 50, replace=TRUE)
all.anchor2 <- as.integer(runif(50, 1, all.anchor1+1))
data <- InteractionSet(
list(counts=matrix(as.integer(rnbinom(200, mu=10, size=10)), 50, 4)),
GInteractions(anchor1=all.anchor1, anchor2=all.anchor2, regions=regions, mode="reverse"),
colData=DataFrame(totals=runif(4, 1e6, 2e6)), metadata=List(width=1))
# Getting peaks.
enrichment <- enrichedPairs(data)
summary(filterPeaks(enrichment, min.enrich=0.5))
summary(filterPeaks(enrichment, min.enrich=0.5, min.count=10))
summary(filterPeaks(enrichment, min.enrich=0.5, min.diag=NULL))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.