expectedDist: Calculate expected distances

Description Usage Arguments Details Value Author(s) See Also Examples

Description

Calculate expected distances between subsequences of the adaptor that should be identical across reads.

Usage

1
expectedDist(sequences, max.err=NA)

Arguments

sequences

A QualityScaledDNAStringSet of read subsequences corresponding to constant regions of the adaptor.

max.err

A numeric scalar specifying the maximum error probability above which bases will be masked.

Details

The aim is to provide an expectation for the distance for identical subsequences, given that all reads should originate from molecules with the same adaptor. In this manner, we can obtain an appropriate threshold for umiGroup that accounts for sequencing and amplification errors. We suggest extracting a subsequence from the interval next to the UMI region. This ensures that the error rate in the extracted subsequence is as similar as possible to the UMI at that position on the read.

Pairwise Levenshtein distances are computed between all extracted sequences. This is quite computationally expensive, so we only process a random subset of these sequences by setting number. If align.stats contains quality scores, bases with error probabilities above max.qual are replaced with Ns. Any Ns are treated as missing and will contribute a mismatch score of 0.5, even for matches to other Ns.

Value

A numeric vector of pairwise distances between sequences that should be identical.

Author(s)

Florian Bieberich, with modifications by Aaron Lun

See Also

extractSubseq to extract a subsequence.

Examples

1
2
3
4
5
constants <- c("ACTAGGAGA",
               "ACTACGACCA",
               "ACTACGATA",
               "ACACGACA")
expectedDist(constants)

MarioniLab/sarlacc documentation built on May 13, 2019, 12:51 p.m.