Description Usage Arguments Value Methods Author(s) See Also Examples
filterDuplReads
filters highly repeated sequences, i.e. with the same chromosome, start and
end positions.
As many such sequences are likely due to over-amplification artifacts, this
can be a useful pre-processing step for ultra high-throughput sequencing
data.
A false discovery rate is computed for each number of repeats being
unusually high. The reads with a higher false discovery rate will be
removed. For more information on the false discovery rate calculation
please read the fdrEnrichment
manual.
tabDuplReads
counts the number reads with no duplications, duplicated once, twice etc.
1 2 3 | filterDuplReads(x, maxRepeats, fdrOverAmp=0.01, negBinomUse=.999,components=0, mc.cores=1)
tabDuplReads(x, minRepeats=1, mc.cores=1)
|
x |
Object containing read locations.
Currently methods for |
maxRepeats |
Reads appearing |
fdrOverAmp |
Reads with false discovery rate of being
over-amplified greater than |
negBinomUse |
Number of counts that will be used to compute the null distribution. Using 1 - 1/1000 would mean that 99.9% of the reads will be used. The ones with higher number of repetitions are the excluded ones. |
components |
number of negative binomials that will be used to fit null distribution. The default value is 1. This value hase to be between 0 and 4. If 0 is given the optimal number of negative biomials is choosen using the Bayesian information criterion (BIC) |
mc.cores |
Number of cores to be used in parallel computing
(passed on to |
minRepeats |
The table is only produced for reads with at least
|
filterDuplReads
returns x
without highly
repetitive sequencesas, determined by
maxRepeats
or ppOverAmp
.
tabDuplReads
returns a table counting the number of sequences
repeating 1 times, 2 times, 3 times etc.
Methods for filterDuplReads
and tabDuplReads
signature(x = "RangedData")
Two reads are duplicated if they have the same space, start and end position.
signature(x = "list")
The method is applied
separately to each RangedData
element in the list.
Evarist Planet, David Rossell, Oscar Flores
fdrEnrichedCounts
to compute the posterior probability
that a certain number of repeats is due to over-amplification.
1 2 3 4 5 6 7 8 9 10 11 12 | set.seed(1)
st <- round(rnorm(1000,500,100))
strand <- rep(c('+','-'),each=500)
space <- sample(c('chr1','chr2'),size=length(st),replace=TRUE)
sample1 <- RangedData(IRanges(st,st+38),strand=strand,space=space)
#Add artificial repeats
st <- rep(400,20)
repeats <- RangedData(IRanges(st,st+38),strand='+',space='chr1')
sample1 <- rbind(sample1,repeats)
filterDuplReads(sample1)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.