groupSimilarSequences | R Documentation |
Create a conversion table for collapsing similar sequences
groupSimilarSequences(
seqs,
scores,
collapseMaxDist,
collapseMinScore,
collapseMinRatio,
verbose
)
seqs |
Character vector with nucleotide sequences (or pairs of sequences concatenated with "_") to be collapsed. The sequences must all be of the same length. |
scores |
Numeric vector of "scores" for the sequences. Typically the total read/UMI count. A higher score will be preferred when deciding which sequence to use as the representative for a group of collapsed sequences. |
collapseMaxDist |
Numeric scalar defining the tolerance for collapsing
similar sequences. If the value is in [0, 1), it defines the maximal
Hamming distance in terms of a fraction of sequence length:
( |
collapseMinScore |
Numeric scalar, indicating the minimum score required for a sequence to be considered as a representative for a group of similar sequences (i.e., to allow other sequences to be collapsed into it). |
collapseMinRatio |
Numeric scalar. During collapsing of similar sequences, a low-frequency sequence will be collapsed with a higher-frequency sequence only if the ratio between the high-frequency and the low-frequency scores is at least this high. A value of 0 indicates that no such check is performed. |
verbose |
Logical scalar, whether to print progress messages. |
A data.frame with two columns, containing the input sequences and the representatives for the groups resulting from grouping similar sequences, respectively.
Michael Stadler, Charlotte Soneson
seqs <- c("AACGTAGCA", "ACCGTAGCA", "AACGGAGCA", "ATCGGAGCA", "TGAGGCATA")
scores <- c(5, 1, 3, 1, 8)
groupSimilarSequences(seqs = seqs, scores = scores,
collapseMaxDist = 1, collapseMinScore = 0,
collapseMinRatio = 0, verbose = FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.