Estimate summaries of the distribution of fragment lengths in a short-read experiment. The methods are designed for ChIP-Seq experiments and may not work well in data without peaks.

Share:

Description

estimate.mean.fraglen implements three methods for estimating mean fragment length. The other functions are related helper functions implementing various methods, but may be useful by themselves for diagnostic purposes. Many of these operations are potentially slow.

sparse.density is intended to be similar to density, but returns the results in a run-length encoded form. This is useful when long stretches of the range of the data have zero density.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
estimate.mean.fraglen(x, method = c("SISSR", "coverage", "correlation"),
                      ...)

basesCovered(x, shift = seq(5, 300, 5), seqLen = 100, verbose = FALSE)

densityCorr(x, shift = seq(0, 500, 5), center = FALSE,
            width = seqLen *2L, seqLen=100L, maxDist = 500L, ...)

sparse.density(x, width = 50, kernel = "epanechnikov",
               from = start(rix)[1] - 10L,
               to = end(rix)[length(rix)] + 10L)

Arguments

x

For estimate.mean.fraglen, typically an AlignedRead or a GRanges object. Also supported but deprecated, as they do not have formal strand information: RangedData (with a "strand" column), or a list-like object with elements "+" and "-" representing locations of reads aligned to positive and negative strands (the values should be integers denoting the location where the first sequenced base matched.) Supported (but again, deprecated) list types include: RangesList, IntegerList or an ordinary R list.

For basesCovered and densityCorr, a list with elements "+" and "-" representing locations of reads aligned to positive and negative strands (the values should be integers denoting the location where the first sequenced base matched.) densityCorr has also come to support GRanges input directly.

For sparse.density, a numeric or integer vector for which density is to be computed.

method

Character string giving method to be used. method = "SISSR" implements the method described in Jothi et al (see References below). method = "correlation" implements the method described in Kharchenko et al (see References below), where the idea is to compute the density of tag start positions separately for each strand, and then determine the amount of shift that maximizes the correlation between these two densities. method = "coverage" computes the optimal shift for which the number of bases covered by any read is minimized.

shift

Integer vector giving amount of shifts to be tried when optimizing. The current algorithm simply evaluates all supplied values and reports the one giving minimum coverage or maximum correlation.

seqLen

For the "coverage" method, the assumed length of each read for computing the coverage. Typically the read length. This is added to the shift estimated by "coverage" and "correlation" to come up with the actual fragment length.

verbose

Logical specifying whether progress information should be printed during execution.

center

For the "correlation" method, whether the calculations should incorporate centering by the mean density. The default is not to do so; as the density is zero over most of the genome, this slightly improves efficiency at negligible loss in accuracy.

width

half-bandwidth used in the computation. This needs to be specified as an integer, data-driven rules are not supported.

kernel

A character string giving the density kernel.

from, to

specifies range over which the density is to be computed.

maxDist

If distance to nearest neighbor is more than this, the position is discarded. This removes isolated points, which are not very informative.

...

Extra arguments, passed on as appropriate to other functions.

Details

For the correlation method, the range over which densities are computed only cover the range of reads; that is, the beginning and end of chromosomes are excluded.

Value

estimate.mean.fraglen gives an estimate of the mean fragment length.

basesCovered and densityCorr give a vector of the corresponding objective function evaluated at the supplied values of shift.

sparse.density returns an object of class "Rle".

Author(s)

Deepayan Sarkar, Michael Lawrence

References

R. Jothi, S. Cuddapah, A. Barski, K. Cui, and K. Zhao. Genome-wide identification of in vivo protein-DNA binding sites from ChIP-Seq data. Nucleic Acids Research, 36:5221–31, 2008.

P. V. Kharchenko, M. Y. Tolstorukov, and P. J. Park. Design and analysis of ChIP experiments for DNA-binding proteins. Nature Biotechnology, 26:1351–1359, 2008.

Examples

1
2
data(cstest)
estimate.mean.fraglen(cstest[["ctcf"]], method = "coverage")