map_or_consensus: Assign consensus sequences to unmapped reads
In brendanf/tzara: Cluster long amplicons using dada2 denoising on variable regions

Description Usage Arguments Details Value

This function is intended to be run on reads from a single sub-region/domain, which have been clustered using a linked sub-region/domain; for instance, ITS1 reads clustered based on identity/similarity of the linked ITS2 reads.

map_or_consensus(
  asvs,
  raw,
  maxdist = 10,
  allow_map = TRUE,
  allow_consensus = TRUE,
  allow_raw = FALSE,
  ...
)

`asvs`	(`character` vector) ASV sequences mapped to a set of reads. Should be `NA_character_` for reads which did not map to an ASV.
`raw`	(`character` vector) Raw read sequences for the same set of reads as `asvs`. May be `NA_character_`
`maxdist`	(`numeric` scalar) Maximum Levenshtein distance between a raw read and an ASV for the read to be mapped to the ASV.
`allow_map`	(`logical` scalar) If `TRUE` and if `asvs` contains non-missing values, attempt to map each raw read without a corresponding ASV to the nearest ASV.
`allow_consensus`	(`logical` scalar) If `TRUE` and if `allow_map` is `FALSE` or there are no non-missing values in `asvs`, then attempt to make a consensus of all raw reads.
`allow_raw`	(`logical` scalar) If `TRUE`, then after mapping and/or consensus building, remaining raw reads are taken as they are. If `FALSE`, the corresponding results will be `NA`.
`...`	passed to `cluster_consensus.character`

If some of the target reads have been mapped to ASVs, then map_or_consensus attempts to map additional raw reads to the same ASVs using a (potentially) more relaxed criteria than dada. This is implemented in map_to_best_asv.

If, on the other hand, none of the input sequences have been assigned to an ASV, then the entire group is taken to represent one cluster, and a consensus sequence for the cluster is determined using cluster_consensus. This process will remove outliers (generally chimeric in origin) and assign NA_character_ to the associated reads, as well as any reads which are already NA due to quality filtering, failed region extraction, etc.

a character vector the same length as asvs, which has the closest ASV for each read if any ASVs are non-missing, or the cluster consensus values for raw reads which were non-missing and not outliers.

brendanf/tzara documentation built on March 11, 2021, 5:40 a.m.