srdistance: Edit distances between reads and a small number of short...
In ShortRead: FASTQ input and manipulation

Description Usage Arguments Details Value Author(s) See Also Examples

srdistance calculates the edit distance from each read in pattern to each read in subject. The underlying algorithm pairwiseAlignment is only efficient when both reads are short, and when the number of subject reads is small.

1	srdistance(pattern, subject, ...)

`pattern`	An object of class `DNAStringSet` containing reads whose edit distance is desired.
`subject`	A short `character` vector, `DNAString` or (small) `DNAStringSet` to serve as reference.
`...`	additional arguments, unused.

The underlying algorithm performs pairwise alignment from each read in pattern to each sequence in subject. The return value is a list of numeric vectors of distances, one list element for each sequence in subject. The vector in each list element contains for each read in pattern the edit distance from the read to the corresponding subject. The weight matrix and gap penalties used to calculate the distance are structured to weight base substitutions and single base insert/deletions equally. Edit distance between known and ambiguous (e.g., N) nucleotides, or between ambiguous nucleotides, are weighted as though each possible nucleotide in the ambiguity were equally likely.

A list of length equal to that of subject. Each element is a numeric vector equal to the length of pattern, with values corresponding to the minimum distance between between the corresponding pattern and subject sequences.

Martin Morgan <mtmorgan@fhcrc.org>

pairwiseAlignment

sp <- SolexaPath(system.file("extdata", package="ShortRead"))
aln <- readAligned(sp, "s_2_export.txt")
polyA <- polyn("A", 35)
polyT <- polyn("T", 35)

d1 <- srdistance(clean(sread(aln)), polyA)
d2 <- srdistance(sread(aln), polyA)
d3 <- srdistance(sread(aln), c(polyA, polyT))