When using single-ended sequencing, the resulting partial sequences map only
in one strand, causing a bias in the coverage profile if not corrected. The
only way to correct this is knowing the average size of the real fragments.
nucleR uses this information when preprocessing single-ended sequences.
You can provide this information by your own (usually a 147bp length is a
good aproximation) or you can use this method to automatically guess the
size of the inserts.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
fragmentLenDetect(reads, samples = 1000, window = 5000, min.shift = 1, max.shift = 100, mc.cores = 1, as.shift = FALSE) ## S4 method for signature 'AlignedRead' fragmentLenDetect(reads, samples = 1000, window = 1000, min.shift = 1, max.shift = 100, mc.cores = 1, as.shift = FALSE) ## S4 method for signature 'GRanges' fragmentLenDetect(reads, samples = 1000, window = 1000, min.shift = 1, max.shift = 100, mc.cores = 1, as.shift = FALSE) ## S4 method for signature 'RangedData' fragmentLenDetect(reads, samples = 1000, window = 1000, min.shift = 1, max.shift = 100, mc.cores = 1, as.shift = FALSE)
Raw single-end reads ShortRead::AlignedRead or GenomicRanges::GRanges format)
Number of samples to perform the analysis (more = slower but more accurate)
Analysis window. Usually there's no need to touch this parameter.
Minimum and maximum shift to apply on the strands to detect the optimal fragment size. If the range is too big, the performance decreases.
If multicore support, maximum number of cores allowed to use.
If TRUE, returns the shift needed to align the middle of the reads in opposite strand. If FALSE, returns the mean inferred fragment length.
This function shifts one strand downstream one base by one from
max.shift. In every step, the correlation on a random position of
window is checked between both strands. The maximum correlation is
returned and averaged for
The final returned length is the best shift detected plus the width of the
reads. You can increase the performance of this function by reducing the
samples value and/or narrowing the shift range. The
window size has
almost no impact on the performance, despite a to small value can give
Inferred mean lenght of the inserts by default, or shift needed to
align strands if
Oscar Flores [email protected]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
library(GenomicRanges) library(IRanges) # Create a sinthetic dataset, simulating single-end reads, for positive and # negative strands # Positive strand reads pos <- syntheticNucMap(nuc.len=40, lin.len=130)$syn.reads # Negative strand (shifted 147bp) neg <- IRanges(end=start(pos)+147, width=40) sim <- GRanges( seqnames="chr1", ranges=c(pos, neg), strand=c(rep("+", length(pos)), rep("-", length(neg))) ) # Detect fragment lenght (we know by construction it is really 147) fragmentLenDetect(sim, samples=50) # The function restricts the sampling to speed up the example
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.