Description Usage Arguments Details Value Author(s) References See Also Examples
This method calculates the cross correlation, i.e. the Pearson correlation between the coverages of the positive and negative strand from a DNA sequencing experiment. The cross correlation can be used as a quality measure in ChIP-seq experiments (Kharchenko et al. 2008). Cross correlation can also be used to estimate the fragment size by determining the shift (given in base pairs) that maximizes the cross correlation.
1 | calculateCrossCorrelation(object, shift=c(200,250,300), bin=10, mode="none", minReads=10000, chrs=NA, mc.cores=1)
|
object |
An |
shift |
The number of bases that the negative strand is shifted towards its three prime end. This can be a vector, if the correlation should be calculated for different shifts. |
bin |
If bin is larger than one, the coverage is calculated for bins of size
|
mode |
|
minReads |
If not at least |
chrs |
A |
mc.cores |
Number of cores to be used. |
Only 5 prime start positions of reads are used for calculating
the coverage. Therefore, after removing duplicates in a single end sequencing
experiment, the coverage can not be larger than one, if the bin size is
set to one. (In this setting, mode both
is meaningless.)
If bin is larger than one, the coverage within a bin is aggregated.
Then, the correlation is calculated for each shift. A shift
(given in basepairs) should be multiple of the bin size
(given in basepairs, too). If not, the binnend coverage is shifted by
round(shift/bin) elements.
The different modes define whether regions without coverage or with
only one covered strand should used. The original implementation in
the package "spp" does not make use of regions without
coverage. However, this seems to be a loss of information, since no
coverage has also a biological meaning in a ChIP-seq experiment. If
the fragment size is approximately 500bp, setting shift=seq(200, 800, 10)
,
bin=10
and mode="none"
should be a good setting.
After the cross correlation was calculated for each chromosome, the weighted mean correlation across all chromosomes is calculated. The weight for a specific chromosome equals the fraction of all reads that were aligned to that chromosome.
A numeric vector with the cross correlation for each shift. The names of the vector correspond to the shifts.
Hans-Ulrich Klein (hklein@broadinstitute.org)
Kharchenko PV, Tolstorukov MY and Park PJ. Design and analysis of ChIP-seq experiments for DNA-binding proteins. Nat Biotechnol 2008, 26(12):1351-9
Landt SG et al., ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res. 2012, 22(9):1813-31
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | triangularKernel <- function(x, pos, h) {
res <- 1 - (abs(x - pos) / h)
res[res < 0] <- 0
return(res)
}
covPos <- round(triangularKernel(1:100, 60, 50) * 100)
covNeg <- round(triangularKernel(1:100, 65, 50) * 100)
reads <- GRanges(IRanges(start=c(rep(seq_along(covPos), covPos), rep(seq_along(covNeg), covNeg) - 9),
width=10),
strand=c(rep("+", sum(covPos)), rep("-", sum(covNeg))),
seqnames=rep("1", sum(covPos)+sum(covNeg)))
calculateCrossCorrelation(reads, shift=c(0,10), bin=1, mode="none", minReads=1)
cor(covPos, covNeg)
cor(covPos[1:(length(covPos)-10)], covNeg[11:length(covNeg)])
calculateCrossCorrelation(reads, shift=c(0,10), bin=1, mode="one", minReads=1)
cor(covPos[covPos != 0 | covNeg != 0], covNeg[covPos != 0 | covNeg != 0])
calculateCrossCorrelation(reads, shift=c(0,10), bin=1, mode="both", minReads=1)
cor(covPos[covPos != 0 & covNeg != 0], covNeg[covPos != 0 & covNeg != 0])
covPos2 <- round(triangularKernel(1:100, 60, 50) * 50)
covNeg2 <- round(triangularKernel(1:100, 68, 50) * 50)
reads2 <- GRanges(IRanges(start=c(rep(seq_along(covPos2), covPos2), rep(seq_along(covNeg2), covNeg2) - 9),
width=10),
strand=c(rep("+", sum(covPos2)), rep("-", sum(covNeg2))),
seqnames=rep("2", sum(covPos2)+sum(covNeg2)))
seqlevels(reads2) <- c("1", "2")
allReads <- c(reads, reads2)
calculateCrossCorrelation(allReads, shift=5, minReads=1, bin=1, mode="none")
cor1 <- cor(covPos[1:(length(covPos)-5)], covNeg[6:length(covNeg)])
cor2 <- cor(covPos2[1:(length(covPos2)-5)], covNeg2[6:length(covNeg2)])
cor1 * (sum(c(covPos, covNeg))/length(allReads)) +
cor2 * (sum(c(covPos2, covNeg2))/length(allReads))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.