filterDuplReads: Detect and filter duplicated reads/sequences.
In htSeqTools: Quality Control, Visualization and Processing for High-Throughput Sequencing data

Description Usage Arguments Value Methods Author(s) See Also Examples

filterDuplReads filters highly repeated sequences, i.e. with the same chromosome, start and end positions. As many such sequences are likely due to over-amplification artifacts, this can be a useful pre-processing step for ultra high-throughput sequencing data. A false discovery rate is computed for each number of repeats being unusually high. The reads with a higher false discovery rate will be removed. For more information on the false discovery rate calculation please read the fdrEnrichment manual.

tabDuplReads counts the number reads with no duplications, duplicated once, twice etc.

1
2
3

filterDuplReads(x, maxRepeats, fdrOverAmp=0.01, negBinomUse=.999,components=0, mc.cores=1)

tabDuplReads(x,  minRepeats=1, mc.cores=1)

`x`	Object containing read locations. Currently methods for `RangedData` and `list`. Duplication is assessed based only on the space, start, end and `x[['strand']]`, i.e. even if they are different based on other variables stored in `values(x)`, the reads are considered duplicated and only the first appearance is returned.
`maxRepeats`	Reads appearing `maxRepeats` or more times will be excluded. If not specified, this is setup automatically based on `fdrOverAmp`.
`fdrOverAmp`	Reads with false discovery rate of being over-amplified greater than `fdrOverAmp` are excluded.
`negBinomUse`	Number of counts that will be used to compute the null distribution. Using 1 - 1/1000 would mean that 99.9% of the reads will be used. The ones with higher number of repetitions are the excluded ones.
`components`	number of negative binomials that will be used to fit null distribution. The default value is 1. This value hase to be between 0 and 4. If 0 is given the optimal number of negative biomials is choosen using the Bayesian information criterion (BIC)
`mc.cores`	Number of cores to be used in parallel computing (passed on to `mclapply`).
`minRepeats`	The table is only produced for reads with at least `minRepeats` repeats.

filterDuplReads returns x without highly repetitive sequencesas, determined by maxRepeats or ppOverAmp.

tabDuplReads returns a table counting the number of sequences repeating 1 times, 2 times, 3 times etc.

Methods for filterDuplReads and tabDuplReads

signature(x = "RangedData"): Two reads are duplicated if they have the same space, start and end position.
signature(x = "list"): The method is applied separately to each RangedData element in the list.

Evarist Planet, David Rossell, Oscar Flores

fdrEnrichedCounts to compute the posterior probability that a certain number of repeats is due to over-amplification.

set.seed(1)
st <- round(rnorm(1000,500,100))
strand <- rep(c('+','-'),each=500)
space <- sample(c('chr1','chr2'),size=length(st),replace=TRUE)
sample1 <- RangedData(IRanges(st,st+38),strand=strand,space=space)

#Add artificial repeats
st <- rep(400,20)
repeats <- RangedData(IRanges(st,st+38),strand='+',space='chr1')
sample1 <- rbind(sample1,repeats)

filterDuplReads(sample1)

htSeqTools documentation built on May 6, 2019, 3:39 a.m.

htSeqTools index

Manual for the htSeqTools library

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

htSeqTools
Quality Control, Visualization and Processing for High-Throughput Sequencing data

filterDuplReads: Detect and filter duplicated reads/sequences.
In htSeqTools: Quality Control, Visualization and Processing for High-Throughput Sequencing data

Description

Usage

Arguments

Value

Methods

Author(s)

See Also

Examples

Related to filterDuplReads in htSeqTools...

R Package Documentation

Browse R Packages

We want your feedback!

htSeqTools Quality Control, Visualization and Processing for High-Throughput Sequencing data

filterDuplReads: Detect and filter duplicated reads/sequences. In htSeqTools: Quality Control, Visualization and Processing for High-Throughput Sequencing data

Description

Usage

Arguments

Value

Methods

Author(s)

See Also

Examples

Related to filterDuplReads in htSeqTools...

R Package Documentation

Browse R Packages

We want your feedback!

htSeqTools
Quality Control, Visualization and Processing for High-Throughput Sequencing data

filterDuplReads: Detect and filter duplicated reads/sequences.
In htSeqTools: Quality Control, Visualization and Processing for High-Throughput Sequencing data