prunePairs | R Documentation |
Prune the read pairs that represent potential artifacts in a Hi-C library
prunePairs(file.in, param, file.out=file.in, max.frag=NA, min.inward=NA, min.outward=NA)
file.in |
a character string specifying the path to the index file produced by |
param |
a |
file.out |
a character string specifying a path to an output index file |
max.frag |
an integer scalar specifying the maximum length of any sequenced DNA fragment |
min.inward |
an integer scalar specifying the minimum distance between inward-facing reads on the same chromosome |
min.outward |
an integer scalar specifying the minimum distance between outward-facing reads on the same chromosome |
This function removes potential artifacts from the input index file, based on the coordinates of the reads in each pair. It will then produce a new HDF5 file containing only the retained read pairs.
Non-NA
values for min.inward
and min.outward
are designed to protect against dangling ends and self-circles, respectively.
This is particularly true when restriction digestion is incomplete, as said structures do not form within a single restriction fragment and cannot be identified earlier.
These can be removed by discarding inward- and outward-facing read pairs that are too close together.
A finite value for max.frag
also protects against non-specific cleavage.
This refers to the length of the actual DNA fragment used in sequencing and is computed from the distance between each read and its nearest downstream restriction site.
Off-target cleavage will result in larger distances than expected.
However, max.frag
should not be set for DNase Hi-C experiments where there is no concept of non-specific cleavage.
Note the distinction between restriction fragments and sequencing fragments.
The former is generated by pre-ligation digestion, and is of concern when choosing min.inward
and min.outward
.
The latter is generated by post-ligation shearing and is of concern when choosing max.frag
.
Suitable values for each parameter can be obtained with the output of getPairData
.
For example, values for min.inward
can be obtained by setting a suitable lower bound on the distribution of non-NA
values for insert
with orientation==1
.
prunePairs
will now respect any settings of restrict
, discard
and cap
in the pairParam
input object.
Reads will be correspondingly removed from the file if they lie outside of restricted chromosomes, within discarded regions or exceed the cap for a restriction fragment pair.
Note that cap
will be ignored for DNase-C experiments as this depends on an unknown bin size.
An integer vector is invisibly returned, containing total
, the total number of read pairs; length
, the number of read pairs with fragment lengths greater than max.frag
; inward
, the number of inward-facing read pairs with gap distances less than min.inward
; and outward
, the number of outward-facing read pairs with gap distances less than min.outward
.
Multiple data frame objects are also produced within the specified out
file, for each corresponding data frame object in file.in
.
For each object, the number of rows may be reduced due to the removal of read pairs corresponding to potential artifacts.
Aaron Lun
Jin F et al. (2013). A high-resolution map of the three-dimensional chromatin interactome in human cells. Nature doi:10.1038/nature12644.
preparePairs
, getPairData
, squareCounts
hic.file <- system.file("exdata", "hic_sort.bam", package="diffHic")
cuts <- readRDS(system.file("exdata", "cuts.rds", package="diffHic"))
param <- pairParam(cuts)
# Note: don't save to a temporary file for actual data.
fout <- tempfile(fileext=".h5")
fout2 <- tempfile(fileext=".h5")
invisible(preparePairs(hic.file, param, fout))
x <- prunePairs(fout, param, fout2)
require(rhdf5)
h5read(fout2, "chrA/chrA")
x <- prunePairs(fout, param, fout2, max.frag=50)
h5read(fout2, "chrA/chrA")
x <- prunePairs(fout, param, fout2, min.inward=50)
h5read(fout2, "chrA/chrA")
x <- prunePairs(fout, param, fout2, min.outward=50)
h5read(fout2, "chrA/chrA")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.