Description Usage Arguments Details Value Author(s) See Also Examples
Calculate expected distances between subsequences of the adaptor that should be identical across reads.
1 | expectedDist(sequences, max.err=NA)
|
sequences |
A QualityScaledDNAStringSet of read subsequences corresponding to constant regions of the adaptor. |
max.err |
A numeric scalar specifying the maximum error probability above which bases will be masked. |
The aim is to provide an expectation for the distance for identical subsequences, given that all reads should originate from molecules with the same adaptor.
In this manner, we can obtain an appropriate threshold for umiGroup
that accounts for sequencing and amplification errors.
We suggest extracting a subsequence from the interval next to the UMI region.
This ensures that the error rate in the extracted subsequence is as similar as possible to the UMI at that position on the read.
Pairwise Levenshtein distances are computed between all extracted sequences.
This is quite computationally expensive, so we only process a random subset of these sequences by setting number
.
If align.stats
contains quality scores, bases with error probabilities above max.qual
are replaced with N
s.
Any N
s are treated as missing and will contribute a mismatch score of 0.5, even for matches to other N
s.
A numeric vector of pairwise distances between sequences that should be identical.
Florian Bieberich, with modifications by Aaron Lun
extractSubseq
to extract a subsequence.
1 2 3 4 5 | constants <- c("ACTAGGAGA",
"ACTACGACCA",
"ACTACGATA",
"ACACGACA")
expectedDist(constants)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.