map_or_consensus: Assign consensus sequences to unmapped reads

Description Usage Arguments Details Value

Description

This function is intended to be run on reads from a single sub-region/domain, which have been clustered using a linked sub-region/domain; for instance, ITS1 reads clustered based on identity/similarity of the linked ITS2 reads.

Usage

1
2
3
4
5
6
7
8
9
map_or_consensus(
  asvs,
  raw,
  maxdist = 10,
  allow_map = TRUE,
  allow_consensus = TRUE,
  allow_raw = FALSE,
  ...
)

Arguments

asvs

(character vector) ASV sequences mapped to a set of reads. Should be NA_character_ for reads which did not map to an ASV.

raw

(character vector) Raw read sequences for the same set of reads as asvs. May be NA_character_

maxdist

(numeric scalar) Maximum Levenshtein distance between a raw read and an ASV for the read to be mapped to the ASV.

allow_map

(logical scalar) If TRUE and if asvs contains non-missing values, attempt to map each raw read without a corresponding ASV to the nearest ASV.

allow_consensus

(logical scalar) If TRUE and if allow_map is FALSE or there are no non-missing values in asvs, then attempt to make a consensus of all raw reads.

allow_raw

(logical scalar) If TRUE, then after mapping and/or consensus building, remaining raw reads are taken as they are. If FALSE, the corresponding results will be NA.

...

passed to cluster_consensus.character

Details

If some of the target reads have been mapped to ASVs, then map_or_consensus attempts to map additional raw reads to the same ASVs using a (potentially) more relaxed criteria than dada. This is implemented in map_to_best_asv.

If, on the other hand, none of the input sequences have been assigned to an ASV, then the entire group is taken to represent one cluster, and a consensus sequence for the cluster is determined using cluster_consensus. This process will remove outliers (generally chimeric in origin) and assign NA_character_ to the associated reads, as well as any reads which are already NA due to quality filtering, failed region extraction, etc.

Value

a character vector the same length as asvs, which has the closest ASV for each read if any ASVs are non-missing, or the cluster consensus values for raw reads which were non-missing and not outliers.


brendanf/tzara documentation built on March 11, 2021, 5:40 a.m.