getPairData: Get read pair data
In diffHic: Differential Analyis of Hi-C Data

Description Usage Arguments Details Value Author(s) See Also Examples

Extract diagnostics for each read pair from an index file

1	getPairData(file, param)

`file`	character string, specifying the path to the index file produced by `preparePairs`
`param`	a `pairParam` object containing read extraction parameters

This is a convenience function to extract read pair diagnostics from an index file, generated from a Hi-C library with preparePairs. The aim is to examine the distribution of each returned value to determine the appropriate cutoffs for prunePairs.

The length refers to the length of the DNA fragment used in sequencing. It is computed for each read pair by adding the distance of each read to the closest restriction site in the direction of the read. This will be set to NA if the fragment IDs are non-positive, e.g., for DNase Hi-C data (where the concept of fragments is irrelevant anyway).

The insert simply refers to the insert size for each read pair. This is defined as the distance between the extremes of each read on the same chromosome. Values for interchromosomal pairs are set to NA.

For orientation, setting 0x1 or 0x2 means that the read mapped into the first or second anchor fragment respectively is on the reverse strand. For intrachromosomal reads, an orientation value of 1 represents inward-facing reads whereas a value of 2 represents outward-facing reads.

getPairData will now respect any settings of restrict, discard or cap in the input pairParam object. Statistics will not be reported for read pairs that lie outside of restricted chromosomes, within discarded regions or exceed the cap for a restriction fragment pair. Note that cap will be ignored for DNase-C experiments as this depends on an unknown bin size.

A dataframe is returned containing integer fields for length, orientation and insert for each read pair.

Aaron Lun

preparePairs, prunePairs

hic.file <- system.file("exdata", "hic_sort.bam", package="diffHic")
cuts <- readRDS(system.file("exdata", "cuts.rds", package="diffHic"))
param <- pairParam(cuts)


tmpf <- tempfile(fileext=".h5")
invisible(preparePairs(hic.file, param, tmpf))
getPairData(tmpf, param)