brendanf/tzara: Cluster long amplicons using dada2 denoising on variable regions

To reduce computational complexity, dada2 only uses non-singletons as seeds for denoising. For this strategy to work, each true sequence must be represented by at least two identical reads. Especially with long amplicons, the probability of two reads having exactly the same errors is much lower than the probability of being error-free, so in practice this means that each true sequence must have two error-free reads. This becomes problematic for rare sequences in long amplicon libraries. An alternative is to use hidden Markov models to cut out the most variable section of the targeted region and use dada2 to create denoised sequences using only that sequence, and then find a consensus sequence for all sequences that match the index region. Tzara (named after Tristan Tzara, a central figure in the Dada art movement) applies this method to rDNA sequences by cutting out the variable ITS2 region using rITSx.

Getting started

Package details

Package repositoryView on GitHub
Installation Install the latest version of this package by entering the following in R:
brendanf/tzara documentation built on Nov. 19, 2020, 8:13 a.m.