Description Usage Arguments Details Value
This algorithm assumes that the sequences "should be" identical except for amplification and sequencing errors. Its main purpose is to calculate a consensus sequence for an amplicon that is too long to use in DADA2 directly, but which has been clustered based on sequence variant identity in one subregion.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | cluster_consensus(seq, nread = 1, ..., ncpus = 1, simplify = TRUE)
## S3 method for class 'character'
cluster_consensus(
seq,
nread = 1,
names = base::names(seq),
dna2rna = TRUE,
...,
ncpus = 1,
simplify = TRUE
)
## S3 method for class 'XStringSet'
cluster_consensus(seq, nread = 1, ..., ncpus = 1, simplify = TRUE)
|
seq |
( |
nread |
( |
... |
passed to methods |
ncpus |
( |
simplify |
( |
names |
( |
dna2rna |
(logical) whether to convert |
The sequences are first aligned using
AlignSeqs
. Sequences which are "outliers" in the
alignment are then removed by
odseq
. If the input sequences were clustered based on
DADA2 sequence variants of a variable region, and the sequences were
appropriately quality filtered prior to running dada
,
then outliers should mostly be chimeras.
After outlier removal, sites with greater than 50% gaps are removed, and
the most frequent letter (ignoring gaps) is chosen at all other sites. If no
letter has greater than 50% representation at a position, then an IUPAC
ambiguous base representing at least 50% of the reads at that position is
chosen for nucleotide sequences, or "X"
for amino acids.
an XStringSet-class
representing the
consensus sequence.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.