Description Usage Arguments Details Value Author(s) See Also Examples
This function is inspired from ShortRead deprecated 'pileup' function (Martin Morgan & Simon Anders). Deprecated function usage is now tipically replaced by 'coverage' from GenomicRanges. A readaptation of this strategy had to be done in order to allow double values as weight (particularly useful for multireads).
Briefly, this function computes a numeric vector of pileup scores (coverage) emcompassing all the coordinates covered by blocks defined by arguments 'start', 'fragLenth', 'dir', and 'readLength'.
1 | pileupDouble(start, fragLength, dir, readLength, weight)
|
start |
Coordinates of leftmost end of reads on the chromosome |
fragLength |
Size of elongation in bp (DNA fragment length) |
dir |
Strand on which the read has been aligned on the reference sequence. IMPORTANT : see remark in description |
readLength |
Length of the sequenced read in bp |
weight |
Double value specifying the weight for each read (can be useful to put it <1 in some cases for reads aligned in multiple places) |
Arguments 'fragLength' 'dir' 'readLength' and 'weight' must have the same length than 'start'. If one of these argument has length==1, the value will be repeated for all reads.
This function re-encodes internally the strand (dir) to a factor with levels c('+', '-') to ensure consistency in results. All other values will be ignored with a warning.
Internally, the function iterates on each defined block/read. For blocks on positive strand ('dir'=='+'), the score incrementing region is defined from the 'start' coordinate to 'start'+'fragLength'. For blocks on negative strand ('dir'=='-'), the score incrementing region is defined from 'start'+'readLength'-'fragLength' to 'start'+'readLength' (it is important to keep in mind that 'start' always represent the leftmost coordinate of the read, whatever the strand is.)
Adaptations were made as compared to ShortRead (Martin Morgan & Simon Anders) pileup function :
R wrapper is in charge of generating vectors in memory
R wrapper converts the data types to C compatible formats (avoids R's conversion code in C function)
R wrapper ensures the arguments have the appropriate length
Resulting values are written in the vector passed as argument (pointer) to C (previously created in the wrapper)
The chromosome length is not needed anymore, the function returns a vector large enough to cover all reads
If a read extension generates out of bound coordinates, a warning message appears but the computation goes on
A (large) numeric vector.
Romain Fenouil
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | # generate artificial reads coordinates on a chromosome
nbReads <- 1000000
chrLength <- 10000000
startPositions <- sample(1:chrLength, nbReads, replace=TRUE)
fragmentsLength <- trunc(rnorm(n=nbReads, mean=146, sd= 40))
strandAligned <- factor(sample(c('+','-'),
nbReads,
replace=TRUE),
levels=c('+', '-'))
readLength <- 36
weight <- 1
# compute the piled vector
res <- pileupDouble(start=startPositions,
fragLength=fragmentsLength,
dir=strandAligned,
readLength=readLength,
weight=weight)
# plot distribution of scores on the chromosome
hist(res)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.