Collate count combinations for interactions between pairs of bins across multiple Hi-C libraries.
a character vector containing paths to the index files generated from each Hi-C library
an integer scalar specifying the width of each bin in base pairs
an integer scalar specifying the minimum count for each square
A logical scalar indicating whether the output regions should be limited to entries in
The genome is first split into non-overlapping adjacent bins of size
In the two-dimensional space, squares are formed from pairs of bins and represent interactions between the corresponding intervals on the genome.
The number of read pairs between each pair of sides is counted for each library to obtain the count for the corresponding square.
For standard Hi-C data, bins are rounded to the nearest restriction site.
The number of restriction fragments in each bin is stored as
nfrags in the metadata of the output
For DNase Hi-C data, no rounding is performed as restriction fragments are irrelevant during DNase digestion.
Each read is placed into a bin based on the location of its 5' end, and
nfrags for all bins are set to zero.
Larger counts can be collected by increasing the value of
This can improve detection power by increasing the evidence for significant differences.
However, this comes at the cost of spatial resolution as adjacent events in the same bin or square can no longer be distinguished.
This may reduce detection power if counts for differential interactions are contaminated by counts for non-differential interactions.
Low-abundance squares with count sums below
filter are not reported.
This reduces memory usage for large datasets.
These squares are probably uninteresting as detection power will be poor for low counts.
Another option is to increase
width to reduce the total number of bins in the genome (and hence, the possible number of bin pairs).
param$restrict is not
only bins on the chromosomes in
param$restrict will be reported in the
regions slot of the output InteractionSet object.
This avoids the overhead of constructing many bins when only a small subset of them are used.
restrict.regions=FALSE to ensure that the anchor IDs of the output object are directly comparable between different settings of
e.g., for merging the results of multiple
Counting will consider the values of
pairParam for more details.
An InteractionSet object is returned containing the number of read pairs for each bin pair across all libraries. Bin pairs are stored as a ReverseStrictGInteractions object.
Imakaev M et al. (2012). Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nat. Methods 9, 999-1003.
Lieberman-Aiden E et al. (2009). Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome. Science 326, 289-293.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
hic.file <- system.file("exdata", "hic_sort.bam", package="diffHic") cuts <- readRDS(system.file("exdata", "cuts.rds", package="diffHic")) param <- pairParam(fragments=cuts) # Setting up the parameters fout <- tempfile(fileext=".h5") invisible(preparePairs(hic.file, param, file=fout)) # Collating to count combinations. y <- squareCounts(fout, param) head(assay(y)) y <- squareCounts(fout, param, filter=1) head(assay(y)) y <- squareCounts(fout, param, width=50, filter=1) head(assay(y)) y <- squareCounts(fout, param, width=100, filter=1) head(assay(y)) # Attempting with other parameters. y <- squareCounts(fout, reform(param, restrict="chrA"), width=100, filter=1) head(assay(y)) y <- squareCounts(fout, filter=1, param=reform(param, restrict=cbind("chrA", "chrB"))) head(assay(y)) y <- squareCounts(fout, filter=1, param=reform(param, cap=1), width=100) head(assay(y)) y <- squareCounts(fout, width=100, filter=1, param=reform(param, discard=GRanges("chrA", IRanges(1, 50)))) head(assay(y))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.