squareCounts: Load Hi-C interaction counts

View source: R/squareCounts.R

squareCountsR Documentation

Load Hi-C interaction counts

Description

Collate count combinations for interactions between pairs of bins across multiple Hi-C libraries.

Usage

squareCounts(files, param, width=50000, filter=1L, restrict.regions=FALSE)

Arguments

files

a character vector containing paths to the index files generated from each Hi-C library

param

a pairParam object containing read extraction parameters

width

an integer scalar specifying the width of each bin in base pairs

filter

an integer scalar specifying the minimum count for each square

restrict.regions

A logical scalar indicating whether the output regions should be limited to entries in param$restrict.

Details

The genome is first split into non-overlapping adjacent bins of size width. In the two-dimensional space, squares are formed from pairs of bins and represent interactions between the corresponding intervals on the genome. The number of read pairs between each pair of sides is counted for each library to obtain the count for the corresponding square.

For standard Hi-C data, bins are rounded to the nearest restriction site. The number of restriction fragments in each bin is stored as nfrags in the metadata of the output region. For DNase Hi-C data, no rounding is performed as restriction fragments are irrelevant during DNase digestion. Each read is placed into a bin based on the location of its 5' end, and nfrags for all bins are set to zero.

Larger counts can be collected by increasing the value of width. This can improve detection power by increasing the evidence for significant differences. However, this comes at the cost of spatial resolution as adjacent events in the same bin or square can no longer be distinguished. This may reduce detection power if counts for differential interactions are contaminated by counts for non-differential interactions.

Low-abundance squares with count sums below filter are not reported. This reduces memory usage for large datasets. These squares are probably uninteresting as detection power will be poor for low counts. Another option is to increase width to reduce the total number of bins in the genome (and hence, the possible number of bin pairs).

If restrict.regions=TRUE and param$restrict is not NULL, only bins on the chromosomes in param$restrict will be reported in the regions slot of the output InteractionSet object. This avoids the overhead of constructing many bins when only a small subset of them are used. By default, restrict.regions=FALSE to ensure that the anchor IDs of the output object are directly comparable between different settings of param$restrict, e.g., for merging the results of multiple squareCounts calls.

Counting will consider the values of restrict, discard and cap in param. See pairParam for more details.

Value

An InteractionSet object is returned containing the number of read pairs for each bin pair across all libraries. Bin pairs are stored as a ReverseStrictGInteractions object.

Author(s)

Aaron Lun

References

Imakaev M et al. (2012). Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nat. Methods 9, 999-1003.

Lieberman-Aiden E et al. (2009). Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome. Science 326, 289-293.

See Also

preparePairs, cutGenome, InteractionSet-class, ReverseStrictGInteractions-class

Examples

hic.file <- system.file("exdata", "hic_sort.bam", package="diffHic")
cuts <- readRDS(system.file("exdata", "cuts.rds", package="diffHic"))
param <- pairParam(fragments=cuts)

# Setting up the parameters
fout <- tempfile(fileext=".h5")
invisible(preparePairs(hic.file, param, file=fout))

# Collating to count combinations.
y <- squareCounts(fout, param)
head(assay(y))
y <- squareCounts(fout, param, filter=1)
head(assay(y))
y <- squareCounts(fout, param, width=50, filter=1)
head(assay(y))
y <- squareCounts(fout, param, width=100, filter=1)
head(assay(y))

# Attempting with other parameters.
y <- squareCounts(fout, reform(param, restrict="chrA"), width=100, filter=1)
head(assay(y))
y <- squareCounts(fout, filter=1,
    param=reform(param, restrict=cbind("chrA", "chrB"))) 
head(assay(y))
y <- squareCounts(fout, filter=1,
    param=reform(param, cap=1), width=100)
head(assay(y))
y <- squareCounts(fout, width=100, filter=1, 
    param=reform(param, discard=GRanges("chrA", IRanges(1, 50))))
head(assay(y))

LTLA/diffHic documentation built on April 1, 2024, 7:21 a.m.