brendanf/tzara: Cluster long amplicons using dada2 denoising on variable regions

To reduce computational complexity, dada2 only uses non-singletons as seeds for denoising. For this strategy to work, each true sequence must be represented by at least two identical reads. Especially with long amplicons, the probability of two reads having exactly the same errors is much lower than the probability of being error-free, so in practice this means that each true sequence must have two error-free reads. This becomes problematic for rare sequences in long amplicon libraries. An alternative is to use hidden Markov models to cut out the most variable section of the targeted region and use dada2 to create denoised sequences using only that sequence, and then find a consensus sequence for all sequences that match the index region. Tzara (named after Tristan Tzara, a central figure in the Dada art movement) applies this method to rDNA sequences by cutting out the variable ITS2 region using rITSx.

README.md

Vignettes Man pages API and functions Files

Package details
Maintainer
License	GPL-3
Version	0.0.11
URL	https://www.github.com/brendanf/tzara
Package repository	View on GitHub
Installation	Install the latest version of this package by entering the following in R: `install.packages("remotes") remotes::install_github("brendanf/tzara")`