savePairs | R Documentation |
Save a dataframe of read pairs into a directory structure for rapid chromosomal access.
savePairs(x, file, param)
x |
A dataframe with integer fields |
file |
A character string specifying the path for the output index file. |
param |
A |
This function facilitates the input of processed Hi-C data from other sources into the current pipeline.
Each row of x
corresponds to a read pair, and each entry in x$anchor1.id
and x$anchor2.id
contains an index for param$fragments
.
Thus, the pair of indices for each row denotes the the interacting regions for each read pair.
These regions are generally expected to be restriction fragments in conventional Hi-C experiments.
Obviously, the coordinates of the restriction fragment boundaries in param$fragments
should correspond to the genome to which the reads were aligned.
These can be generated using the cutGenome
function from any given BSgenome
object or the FASTA files used for alignment.
Values of param$discard
and param$restrict
will not be used here and can be ignored.
Any additional fields in x
will also be saved to file.
Users are recommended to put in anchor1.pos
, anchor1.len
, anchor2.pos
and anchor2.len
fields.
These should mimic the output of preparePairs
:
anchorY.pos
:Integer field, containing the 1-based genomic position of the left-most aligned base of read Y.
anchorY.len
:Integer field, containing the length of the alignment of read Y on the reference sequence. This should be multiplied by -1 if the alignment was on the negative strand.
These fields enable the use of more diffHic functions, e.g., removal of reads in param$discard
during counting with squareCounts
,
correct calculation of statistics with getPairData
, quality control with prunePairs
.
For storing DNase Hi-C data, param$fragments
should be empty but the seqinfo
should contain the lengths and names of all chromosomes.
Here, the input anchor1.id
and anchor2.id
should contain indices of the seqlengths
.
This specifies the chromosome to which each read is aligned, e.g., an anchor1.id
of 2 means that read 1 is aligned to the second chromosome in seqlengths
.
Note that, for this type of data, it is essential to store the position and length fields mentioned above.
When constructing the output file, x
will be resorted by anchor1.id
, then anchor2.id
.
If necessary, anchor1 and anchor2 indices will be switched such that the former is never less than the latter.
For DNase Hi-C data, both of these fields will ultimately be set to zero - see prepPseudoPairs
for more details.
An index file is produced at the specified file
location, containing the interaction data.
A NULL
value is invisibly returned.
Aaron Lun
preparePairs
,
cutGenome
hic.file <- system.file("exdata", "hic_sort.bam", package="diffHic")
cuts <-readRDS(system.file("exdata", "cuts.rds", package="diffHic"))
param <- pairParam(cuts)
n <- 1000
all.a <- as.integer(runif(n, 1L, length(cuts)))
all.t <- as.integer(runif(n, 1L, length(cuts)))
x <- data.frame(anchor1.id=all.a, anchor2.id=all.t,
anchor1.pos=runif(1:100), anchor1.len=10,
anchor2.pos=runif(1:100), anchor2.len=-10)
# Note: don't save to a temporary file for actual data.
fout <- tempfile(fileext=".h5")
savePairs(x, fout, param)
require(rhdf5)
head(h5read(fout, "chrA/chrA"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.