View source: R/enrichedPairs.R
enrichedPairs | R Documentation |
Determine the count for a variety of local neighborhoods around a bin pair, for use in computing peak enrichment statistics.
enrichedPairs(data, flank=5, exclude=0, assay.in=1, assay.out=NULL)
data |
an InteractionSet object containing bin pair counts, generated by |
flank |
an integer scalar, specifying the number of bins to consider as the local neighborhood |
exclude |
an integer scalar, specifying the number of bins to exclude from the neighborhood |
assay.in |
a string or integer scalar, specifying the assay containing bin pair counts in |
assay.out |
a character vector containing 4 unique names for the neighborhood regions A-D, see below |
An object of the same type as data
is returned, containing additional matrices in the assays
slot.
Each matrix contains the counts for one neighborhood for each bin pair in each library.
The area of each neighborhood is also returned in the mcols
and named with the "N."
prefix.
Consider the coordinates of the interaction space in terms of bins, and focus on any particular bin pair (named here as the target bin pair).
This target bin pair is characterized by four neighborhood regions, from A to D.
Region A (named "quadrant"
) is a square with side lengths equal to flank
, positioned such that the target bin pair lies at the corner furthest from the diagonal (only used for intra-chromosomal targets).
Region B (named "vertical"
) is a vertical rectangle with dimensions (1, flank*2+1)
, containing the target bin pair at the center.
Region C (named "horizontal"
) is the horizontal counterpart to B.
Region D (named "surrounding"
) is a square with side lengths equal to flank*2+1
, where the target bin pair is positioned in the center.
Obviously, the target bin pair itself is excluded in the definition of each neighborhood.
If exclude
is positive, additional bin pairs closest to the target will also be excluded.
For example, region A* is constructed with exclude
instead of flank
, and the resulting area is excluded from region A (and so on for all other regions).
This avoids problems where diffuse interactions are imperfectly captured by the target bin pair, such that genuine interactions spill over into the neighborhood.
Spill-over is undesirable as it will inflate the size of the neighborhood counts for genuine interactions.
Setting a larger exclude
ensures that this does not occur.
The size of flank
requires consideration, as it defines the size of each neighborhood region.
If the value is too large, other peaks may be included in the background such that the neighborhood count size is inflated.
On the other hand, if flank
is too small, there will not be enough neighborhood bin pairs to dilute the increase in counts from spill-over.
Both scenarios result in a decrease in enrichment values and loss of power to detect punctate events.
The default value of 5 seems to work well, though users may wish to test several values for themselves.
For each bin pair, the other bin pairs in data
that belong to its neighborhood are identified.
The sum of counts across these bin pairs is computed for each library and stored in a matrix.
This is repeated for each type of neighborhood (A-D), and the matrices are named based on assay.out
.
The area of each neighborhood is also computed in terms of the number of bin pairs contained by the neighborhood.
Note that the neighborhood area includes bin pairs that are missing from data
, as these are assumed to have a count of zero.
See filterPeaks
for how these neighborhood counts are used to assess the “peak-ness” of each bin pair.
Aaron Lun
Rao S et al. (2014). A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell. 159, 1665-1690.
squareCounts
,
neighborCounts
,
filterPeaks
# Setting up the object.
a <- 10
b <- 20
regions <- GRanges(rep(c("chrA", "chrB"), c(a, b)), IRanges(c(1:a, 1:b), c(1:a, 1:b)))
set.seed(23943)
all.anchor1 <- sample(length(regions), 50, replace=TRUE)
all.anchor2 <- as.integer(runif(50, 1, all.anchor1+1))
data <- InteractionSet(matrix(as.integer(rnbinom(200, mu=10, size=10)), 50, 4),
GInteractions(anchor1=all.anchor1, anchor2=all.anchor2,
regions=regions, mode="reverse"),
colData=DataFrame(lib.size=1:4*1000), metadata=List(width=1))
data$totals <- colSums(assay(data))
# Getting peaks.
head(enrichedPairs(data))
head(enrichedPairs(data, flank=3))
head(enrichedPairs(data, flank=1))
head(enrichedPairs(data, exclude=1))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.