Prune the read pairs that represent potential artifacts in a Hi-C library
a character string specifying the path to the index file produced by
a character string specifying a path to an output index file
an integer scalar specifying the maximum length of any sequenced DNA fragment
an integer scalar specifying the minimum distance between inward-facing reads on the same chromosome
an integer scalar specifying the minimum distance between outward-facing reads on the same chromosome
This function removes potential artifacts from the input index file, based on the coordinates of the reads in each pair. It will then produce a new HDF5 file containing only the retained read pairs.
NA values for
min.outward are designed to protect against dangling ends and self-circles, respectively.
This is particularly true when restriction digestion is incomplete, as said structures do not form within a single restriction fragment and cannot be identified earlier.
These can be removed by discarding inward- and outward-facing read pairs that are too close together.
A finite value for
max.frag also protects against non-specific cleavage.
This refers to the length of the actual DNA fragment used in sequencing and is computed from the distance between each read and its nearest downstream restriction site.
Off-target cleavage will result in larger distances than expected.
max.frag should not be set for DNase Hi-C experiments where there is no concept of non-specific cleavage.
Note the distinction between restriction fragments and sequencing fragments.
The former is generated by pre-ligation digestion, and is of concern when choosing
The latter is generated by post-ligation shearing and is of concern when choosing
Suitable values for each parameter can be obtained with the output of
For example, values for
min.inward can be obtained by setting a suitable lower bound on the distribution of non-
NA values for
prunePairs will now respect any settings of
cap in the
pairParam input object.
Reads will be correspondingly removed from the file if they lie outside of restricted chromosomes, within discarded regions or exceed the cap for a restriction fragment pair.
cap will be ignored for DNase-C experiments as this depends on an unknown bin size.
An integer vector is invisibly returned, containing
total, the total number of read pairs;
length, the number of read pairs with fragment lengths greater than
inward, the number of inward-facing read pairs with gap distances less than
outward, the number of outward-facing read pairs with gap distances less than
Multiple data frame objects are also produced within the specified
out file, for each corresponding data frame object in
For each object, the number of rows may be reduced due to the removal of read pairs corresponding to potential artifacts.
Jin F et al. (2013). A high-resolution map of the three-dimensional chromatin interactome in human cells. Nature doi:10.1038/nature12644.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
hic.file <- system.file("exdata", "hic_sort.bam", package="diffHic") cuts <- readRDS(system.file("exdata", "cuts.rds", package="diffHic")) param <- pairParam(cuts) # Note: don't save to a temporary file for actual data. fout <- tempfile(fileext=".h5") fout2 <- tempfile(fileext=".h5") invisible(preparePairs(hic.file, param, fout)) x <- prunePairs(fout, param, fout2) require(rhdf5) h5read(fout2, "chrA/chrA") x <- prunePairs(fout, param, fout2, max.frag=50) h5read(fout2, "chrA/chrA") x <- prunePairs(fout, param, fout2, min.inward=50) h5read(fout2, "chrA/chrA") x <- prunePairs(fout, param, fout2, min.outward=50) h5read(fout2, "chrA/chrA")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.