Count the number of read pairs connecting pairs of user-specified regions
1 2 3
a character vector containing the paths to the count file for each library
a GRanges object specifying the regions between which read pairs should be counted
an integer scalar specifying the minimum count for each interaction
a character string specifying how restriction fragments should be assigned to regions
a GRanges object containing the second regions of interest, or an integer scalar specifying the bin size
A logical scalar indicating whether the output regions should be limited to entries in
Interactions of interest are defined as those formed by pairs of elements in
The number of read pairs connecting each pair of elements can then be counted in each library.
This can be useful for quantifying/summarizing interactions between genomic features, e.g., promoters or gene bodies.
For a pair of intervals in
regions, the interaction count is defined as the number of read pairs with one read in each interval.
To save memory, pairs of intervals can be filtered to retain only those with a count sum across all libraries above
In each pair, the anchor interval is defined as that with the higher start position.
Note that the end position may not be higher if nested intervals are present in
For typical Hi-C experiments, mapping of read pairs into intervals is performed at the level of restriction fragments.
The value of
type feeds into
findOverlaps and controls the manner in which restriction fragments are assigned to each region.
By default, a restriction fragment is assigned to one or more regions if said fragment overlaps with any part of those regions.
This expands the effective boundaries of each entry of
regions to the nearest restriction site.
In contrast, setting
type="within" would contract each interval.
For DNase Hi-C experiments, the interval spanned by the alignment of each read is overlapped against the intervals in
This uses the
linkOverlaps function, which responds to any specification of
The boundaries of
regions are not modified as no restriction fragments are involved.
Counting will consider the values of
param - see
pairParam for more details.
In all cases, strandedness of the intervals is ignored in input and set to
"*" in the output object.
Any element metadata in the input
regions is also removed in the output.
An InteractionSet containing the number of read pairs in each library that are mapped between pairs of
regions, or between
Interacting regions are returned as a ReverseStrictGInteractions object containing the concatenated
second.regions argument allows specification of a second set of regions.
Interactions are only considered between one entry in
regions and one entry in
This differs from supplying all regions to
regions, which would consider all pairwise interactions between regions regardless of whether they belong in the first or second set.
Note that the sets are not parallel, and any pairing is considered if as long as it contains one region from the first set and another from the second set.
second.regions is useful for efficiently identifying interactions between two sets of regions.
For example, the first set can be set to several “viewpoint” regions of interest.
This is similar to the bait region in 4C-seq, or the captured regions in Capture Hi-C.
Interactions between these viewpoints and the rest of the genome can then be examined by setting
second.regions to some appropriate bin size.
If an integer scalar is supplied as
second.regions, this value is used as a width to partition the genome into bins.
These bins are then used as the set of second regions.
This is useful for 4C-like experiments where interactions between viewpoints and the rest of the genome are of interest.
Note that this function does not guarantee that the second set of regions will be treated as the second anchor region (or the first) for each interaction.
Those definitions are dependent on the sorting order of the coordinates for all regions.
Users should only use the
is.second field to identify the region from the second set in each interaction.
For standard Hi-C experiments, all supplied regions are expanded or contracted to the nearest restriction site.
These modified regions can be extracted from the
regions slot in the output InteractionSet object, which will be reordered according to the new start positions.
The ordering permutation can be recovered from the
original metadata field of the GRanges object.
Similarly, the number of restriction fragments assigned to each interval is stored in the
nfrags metadata field.
For DNase-C experiments, no expansion of the regions is performed, so the coordinates in the output
regions slot are the same as those in the input
However, reordering may still be necessary, in which case the
original field will specify the original index of each entry.
nfrags are set to zero as no restriction fragments are involved.
second.regions is specified, the output
regions slot will contain both the input
regions and the
second.regions (though not necessarily in that order).
Entries that were originally in
second.regions can be distinguished with the
is.second metadata field.
original index will also point towards the corresponding entry in the original
is.second=FALSE, the index will point towards the corresponding entry in the original
second.regions is an integer scalar, the entries in the output
regions slot will contain the coordinates for the resulting bins.
Note that the
original metadata field is set to
NA for these bins, as no original GRanges existed for these intervals.
param$restrict is not
only bins on the chromosomes in
param$restrict will be reported in the output
This avoids the overhead of constructing many bins when only a small subset of them are used.
restrict.regions=FALSE to ensure that the anchor IDs of the output object are directly comparable between different settings of
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
hic.file <- system.file("exdata", "hic_sort.bam", package="diffHic") cuts <- readRDS(system.file("exdata", "cuts.rds", package="diffHic")) param <- pairParam(cuts) # Setting up the parameters fout <- "output" invisible(preparePairs(hic.file, param, fout)) regions <- suppressWarnings(c( GRanges("chrA", IRanges(c(1, 100, 150), c(20, 140, 160))), GRanges("chrB", IRanges(50, 100)))) # Collating to count combinations. con <- connectCounts(fout, param, regions=regions, filter=1L) head(assay(con)) con <- connectCounts(fout, param, regions=regions, filter=1L, type="within") head(assay(con)) # Still works with restriction and other parameters. con <- connectCounts(fout, param=reform(param, restrict="chrA"), regions=regions, filter=1L) head(assay(con)) con <- connectCounts(fout, param=reform(param, discard=GRanges("chrA", IRanges(1, 50))), regions=regions, filter=1L) head(assay(con)) con <- connectCounts(fout, param=reform(param, cap=1), regions=regions, filter=1L) head(assay(con)) # Specifying a second region. regions2 <- suppressWarnings(c( GRanges("chrA", IRanges(c(50, 100), c(100, 200))), GRanges("chrB", IRanges(1, 50)))) con <- connectCounts(fout, param, regions=regions, filter=1L, second.region=regions2) head(anchors(con, type="first")) head(anchors(con, type="second")) con <- connectCounts(fout, param, regions=regions, filter=1L, second.region=50) head(anchors(con, type="first")) head(anchors(con, type="second"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.