Perform iterative correction on counts for Hi-C interactions to correct for biases between fragments.
an InteractionSet object produced by
an integer scalar specifying the number of correction iterations
an integer scalar, indicating the distance off the diagonal under which bin pairs are excluded
a numeric scalar, indicating the proportion of low-abundance bins to ignore
a numeric scalar indicating the proportion of high-abundance bin pairs to winsorize
a logical scalar specifying whether counts should be averaged across libraries
a logical scalar indicating whether to correct for distance effects
a string or integer scalar specifying the matrix to use from
This function implements the iterative correction procedure described by Imakaev et al. in their 2012 paper. Briefly, this aims to factorize the count for each bin pair into the biases for each of the two anchor bins and the true interaction probability. The bias represents the ease of sequencing/mapping/other for the genome sequence in each bin.
data argument should be generated by taking the output of
squareCounts after setting
Filtering should be avoided as counts in low-abundance bin pairs may be informative upon summation for each bin.
For example, a large count sum for a bin may be formed from many bin pairs with low counts.
Removal of those bin pairs would result in loss of information.
average=TRUE, if multiple libraries are used to generate
data, an average count will be computed for each bin pairs across all libraries using
The average count will then be used for correction.
Otherwise, correction will be performed on the counts for each library separately.
The maximum step size in the output can be used as a measure of convergence. Ideally, the step size should approach 1 as iterations pass. This indicates that the correction procedure is converging to a single solution, as the maximum change to the computed biases is decreasing.
A list with several components.
a numeric vector containing the true interaction probabilities for each bin pair
a numeric vector of biases for all bins
a numeric vector containing the maximum fold-change change in biases at each iteration
a numeric vector specifying the fitted value for the distance-dependent trend, if
average=FALSE, each component is a numeric matrix instead.
Each column of the matrix contains the specified information for each library in
Some robustness is provided by winsorizing out strong interactions with
winsor.high to ensure that they do not overly influence the computed biases.
This is useful for removing spikes around repeat regions or due to PCR duplication.
Low-abundance bins can also be removed with
ignore.low to avoid instability during correction, though this will result in
NA values in the output.
Local bin pairs can be excluded as these are typically irrelevant to long-range interactions.
They are also typically very high-abundance and may have excessive weight during correction, if not removed.
This can be done by removing all bin pairs where the difference between the first and second anchor indices is less than
exclude.local=NA will only use inter-chromosomal bin pairs for correction.
dist.correct=TRUE, abundances will be adjusted for distance-dependent effects.
This is done by computing residuals from the fitted distance-abundance trend, using the
These residuals are then used for iterative correction, such that local interactions will not always have higher contact probabilities.
Ideally, the probability sums to unity across all bin pairs for a given bin (ignoring
This is complicated by winsorizing of high-abundance interactions and removal of local interactions.
These interactions are not involved in correction, but are still reported in the output
As a result, the sum may not equal unity, i.e., values are not strictly interpretable as probabilities.
Imakaev M et al. (2012). Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nat. Methods 9, 999-1003.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
# Dummying up some data. set.seed(3423746) npts <- 100 npairs <- 5000 nlibs <- 4 anchor1 <- sample(npts, npairs, replace=TRUE) anchor2 <- sample(npts, npairs, replace=TRUE) data <- InteractionSet( list(counts=matrix(rpois(npairs*nlibs, runif(npairs, 10, 100)), nrow=npairs)), GInteractions(anchor1=anchor1, anchor2=anchor2, regions=GRanges("chrA", IRanges(1:npts, 1:npts)), mode="reverse"), colData=DataFrame(totals=runif(nlibs, 1e6, 2e6))) # Correcting. stuff <- correctedContact(data) head(stuff$truth) head(stuff$bias) plot(stuff$max) # Different behavior with average=FALSE. stuff <- correctedContact(data, average=FALSE) head(stuff$truth) head(stuff$bias) head(stuff$max) # Creating an offset matrix, for use in glmFit. anchor1.bias <- stuff$bias[anchors(data, type="first", id=TRUE),] anchor2.bias <- stuff$bias[anchors(data, type="second", id=TRUE),] offsets <- log(anchor1.bias * anchor2.bias) # Adjusting for distance, and computing offsets with trend correction. stuff <- correctedContact(data, average=FALSE, dist.correct=TRUE) head(stuff$truth) head(stuff$trend) offsets <- log(stuff$bias[anchors(data, type="first", id=TRUE),]) + log(stuff$bias[anchors(data, type="second", id=TRUE),])
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.