enrichedPairs: Collect local enrichment statistics for bin pairs
In LTLA/diffHic: Differential Analysis of Hi-C Data

enrichedPairs

R Documentation

Collect local enrichment statistics for bin pairs

Description

Determine the count for a variety of local neighborhoods around a bin pair, for use in computing peak enrichment statistics.

Usage

enrichedPairs(data, flank=5, exclude=0, assay.in=1, assay.out=NULL)

Arguments

`data`	an InteractionSet object containing bin pair counts, generated by `squareCounts`
`flank`	an integer scalar, specifying the number of bins to consider as the local neighborhood
`exclude`	an integer scalar, specifying the number of bins to exclude from the neighborhood
`assay.in`	a string or integer scalar, specifying the assay containing bin pair counts in `data`
`assay.out`	a character vector containing 4 unique names for the neighborhood regions A-D, see below

Value

An object of the same type as data is returned, containing additional matrices in the assays slot. Each matrix contains the counts for one neighborhood for each bin pair in each library. The area of each neighborhood is also returned in the mcols and named with the "N." prefix.

Definition of the neighborhoods

Consider the coordinates of the interaction space in terms of bins, and focus on any particular bin pair (named here as the target bin pair). This target bin pair is characterized by four neighborhood regions, from A to D. Region A (named "quadrant") is a square with side lengths equal to flank, positioned such that the target bin pair lies at the corner furthest from the diagonal (only used for intra-chromosomal targets). Region B (named "vertical") is a vertical rectangle with dimensions (1, flank*2+1), containing the target bin pair at the center. Region C (named "horizontal") is the horizontal counterpart to B. Region D (named "surrounding") is a square with side lengths equal to flank*2+1, where the target bin pair is positioned in the center.

Obviously, the target bin pair itself is excluded in the definition of each neighborhood. If exclude is positive, additional bin pairs closest to the target will also be excluded. For example, region A* is constructed with exclude instead of flank, and the resulting area is excluded from region A (and so on for all other regions). This avoids problems where diffuse interactions are imperfectly captured by the target bin pair, such that genuine interactions spill over into the neighborhood. Spill-over is undesirable as it will inflate the size of the neighborhood counts for genuine interactions. Setting a larger exclude ensures that this does not occur.

The size of flank requires consideration, as it defines the size of each neighborhood region. If the value is too large, other peaks may be included in the background such that the neighborhood count size is inflated. On the other hand, if flank is too small, there will not be enough neighborhood bin pairs to dilute the increase in counts from spill-over. Both scenarios result in a decrease in enrichment values and loss of power to detect punctate events. The default value of 5 seems to work well, though users may wish to test several values for themselves.

For each bin pair, the other bin pairs in data that belong to its neighborhood are identified. The sum of counts across these bin pairs is computed for each library and stored in a matrix. This is repeated for each type of neighborhood (A-D), and the matrices are named based on assay.out. The area of each neighborhood is also computed in terms of the number of bin pairs contained by the neighborhood. Note that the neighborhood area includes bin pairs that are missing from data, as these are assumed to have a count of zero. See filterPeaks for how these neighborhood counts are used to assess the “peak-ness” of each bin pair.

Author(s)

Aaron Lun

References

Rao S et al. (2014). A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell. 159, 1665-1690.

Examples

# Setting up the object.
a <- 10
b <- 20
regions <- GRanges(rep(c("chrA", "chrB"), c(a, b)), IRanges(c(1:a, 1:b), c(1:a, 1:b)))

set.seed(23943)
all.anchor1 <- sample(length(regions), 50, replace=TRUE)
all.anchor2 <- as.integer(runif(50, 1, all.anchor1+1))
data <- InteractionSet(matrix(as.integer(rnbinom(200, mu=10, size=10)), 50, 4), 
    GInteractions(anchor1=all.anchor1, anchor2=all.anchor2, 
        regions=regions, mode="reverse"), 
    colData=DataFrame(lib.size=1:4*1000), metadata=List(width=1))
data$totals <- colSums(assay(data))

# Getting peaks.
head(enrichedPairs(data))
head(enrichedPairs(data, flank=3))
head(enrichedPairs(data, flank=1))
head(enrichedPairs(data, exclude=1))

LTLA/diffHic documentation built on Dec. 20, 2024, 7:06 p.m.