prunePairs: Prune read pairs

View source: R/prunePairs.R

prunePairsR Documentation

Prune read pairs

Description

Prune the read pairs that represent potential artifacts in a Hi-C library

Usage

prunePairs(file.in, param, file.out=file.in, max.frag=NA, min.inward=NA, min.outward=NA)

Arguments

file.in

a character string specifying the path to the index file produced by preparePairs

param

a pairParam object containing read extraction parameters

file.out

a character string specifying a path to an output index file

max.frag

an integer scalar specifying the maximum length of any sequenced DNA fragment

min.inward

an integer scalar specifying the minimum distance between inward-facing reads on the same chromosome

min.outward

an integer scalar specifying the minimum distance between outward-facing reads on the same chromosome

Details

This function removes potential artifacts from the input index file, based on the coordinates of the reads in each pair. It will then produce a new HDF5 file containing only the retained read pairs.

Non-NA values for min.inward and min.outward are designed to protect against dangling ends and self-circles, respectively. This is particularly true when restriction digestion is incomplete, as said structures do not form within a single restriction fragment and cannot be identified earlier. These can be removed by discarding inward- and outward-facing read pairs that are too close together.

A finite value for max.frag also protects against non-specific cleavage. This refers to the length of the actual DNA fragment used in sequencing and is computed from the distance between each read and its nearest downstream restriction site. Off-target cleavage will result in larger distances than expected. However, max.frag should not be set for DNase Hi-C experiments where there is no concept of non-specific cleavage.

Note the distinction between restriction fragments and sequencing fragments. The former is generated by pre-ligation digestion, and is of concern when choosing min.inward and min.outward. The latter is generated by post-ligation shearing and is of concern when choosing max.frag.

Suitable values for each parameter can be obtained with the output of getPairData. For example, values for min.inward can be obtained by setting a suitable lower bound on the distribution of non-NA values for insert with orientation==1.

prunePairs will now respect any settings of restrict, discard and cap in the pairParam input object. Reads will be correspondingly removed from the file if they lie outside of restricted chromosomes, within discarded regions or exceed the cap for a restriction fragment pair. Note that cap will be ignored for DNase-C experiments as this depends on an unknown bin size.

Value

An integer vector is invisibly returned, containing total, the total number of read pairs; length, the number of read pairs with fragment lengths greater than max.frag; inward, the number of inward-facing read pairs with gap distances less than min.inward; and outward, the number of outward-facing read pairs with gap distances less than min.outward.

Multiple data frame objects are also produced within the specified out file, for each corresponding data frame object in file.in. For each object, the number of rows may be reduced due to the removal of read pairs corresponding to potential artifacts.

Author(s)

Aaron Lun

References

Jin F et al. (2013). A high-resolution map of the three-dimensional chromatin interactome in human cells. Nature doi:10.1038/nature12644.

See Also

preparePairs, getPairData, squareCounts

Examples

hic.file <- system.file("exdata", "hic_sort.bam", package="diffHic")
cuts <- readRDS(system.file("exdata", "cuts.rds", package="diffHic"))
param <- pairParam(cuts)

# Note: don't save to a temporary file for actual data.
fout <- tempfile(fileext=".h5")
fout2 <- tempfile(fileext=".h5")
invisible(preparePairs(hic.file, param, fout))
x <- prunePairs(fout, param, fout2)

require(rhdf5)
h5read(fout2, "chrA/chrA")

x <- prunePairs(fout, param, fout2, max.frag=50)
h5read(fout2, "chrA/chrA")

x <- prunePairs(fout, param, fout2, min.inward=50)
h5read(fout2, "chrA/chrA")

x <- prunePairs(fout, param, fout2, min.outward=50)
h5read(fout2, "chrA/chrA")

LTLA/diffHic documentation built on April 1, 2024, 7:21 a.m.