reconstruct: Reconstruct a longer region out of ASVs or consensus sequence...

Description Usage Arguments Details Value

View source: R/tzara.R

Description

The sequences from each denoised sub-region/domain are concatenated to create a denoised sequence for the long region. Additionally, de-novo bimera detection is performed using isBimeraDenovo or isBimeraDenovoTable on sets of three consecutive sub-regions/domains; in the intended application, these sets will be variable–conserved–variable.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
reconstruct(
  seqtabs,
  regions = names(seqtabs),
  regions_regex = NULL,
  regions_replace = NULL,
  output = "concat",
  use_output = c("first", "second", "no"),
  order = regions,
  read_column = "seq_id",
  asv_column = "dada_seq",
  rawtabs = seqtabs,
  raw_column = NULL,
  raw_regions = names(rawtabs),
  sample_column = NULL,
  sample_regex = NULL,
  sample_replace = NULL,
  chimera_offset = 0,
  allow_map = TRUE,
  allow_consensus = TRUE,
  allow_raw = FALSE,
  ...
)

Arguments

seqtabs

(list of data.frame) with columns read_column, asv_column, and optionally sample_column. Any additional columns are ignored. read_column should give a unique ID for each sequencing read, and asv_column should give the denoised sequence for the read.

regions

(character vector with the same length as seqtabs) The names of the regions/domains represented by each of the tables in seqtabs. If not supplied, then seqtabs should be named by the regions.

regions_regex

(character scalar, or NULL) A regular expression. If regions_regex is given but regions_replace is not, then only the part of the entries in regions matching the regex are used to define samples (using str_extract). If regions_replace is also used, then the regex is instead replaced by regions_replace (using str_replace). NA_character is treated the same way as NULL.

regions_replace

(character scalar, or NULL) Replacement string for regions_regex. NA_character is treated the same way as NULL.

output

(character scalar or named list of character vectors) If a character scalar, then the name to be used for the (single) output region. In this case the region will be the concatenation of all the regions in order. Alternatively, a list where the names are the names of the output regions, and the values are character vectors giving the regions which should be concatenated for each output region.

use_output

(one of "first", "second", or "no") If one of the regions given by output is also present in seqtabs, then the seqtabs version is used preferentially use_output == "first", as a backup value when one of the subregions/domains is missing if use_output == "second", or not at all if use_output == "no".

order

(character vector) The order in which the sub-regions/domains should be concatenated to produce the output(s).

read_column

(character scalar) Column name from the seqtabs which uniquely identifies each read (but different regions extracted from the same read should have the same ID.)

asv_column

(character scalar) Column name from the seqtabs which gives the denoised sequences.

rawtabs

(list of data.frame) Data sources of the same format as seqtabs, with columns read_column and raw_column. These should be of the same number as seqtabs, and correspond to the sub-regions/domains specified in regions. The default is to look for raw_column in seqtabs.

raw_column

(character scalar, or NULL) Column name from the seqtabs which gives the raw sequences. If NULL or NA_character_, then consensus sequences will not be used as a backup when no denoised sequence is present.

raw_regions

(character vector with the same length as rawtabs) The names of the regions/domains represented by each of the tables in rawtabs. These will be processed using regions_regex and regions_replace, if given.

sample_column

(character scalar, or NULL) An optional column name from the seqtabs which identifies which sample each sequence is from. If given, this is used (after possible modification by sample_regex and sample_replace) to identify different samples for isBimeraDenovoTable. NA_character is treated the same way as NULL.

sample_regex

(character scalar, or NULL) A regular expression. If sample_regex is given but sample_replace is not, then only the part of the entries in sample_column matching the regex are used to define samples (using str_extract). If sample_replace is also used, then the regex is instead replaced by sample_replace (using str_replace). NA_character is treated the same way as NULL.

sample_replace

(character scalar, or NULL) Replacement string for sample_regex. NA_character is treated the same way as NULL.

chimera_offset

(integer) By default, bimeras are checked for sub-region/domains 1, 2, 3; 3, 4, 5; 5, 6, 7; etc. This is appropriate if the domains alternate variable, conserved, variable, etc. If a more conserved domain is first, use chimera_offset = 1.

allow_map

(logical scalar) If TRUE and if asvs contains non-missing values, attempt to map each raw read without a corresponding ASV to the nearest ASV.

allow_consensus

(logical scalar) If TRUE and if allow_map is FALSE or there are no non-missing values in asvs, then attempt to make a consensus of all raw reads.

allow_raw

(logical scalar) If TRUE, then after mapping and/or consensus building, remaining raw reads are taken as they are. If FALSE, the corresponding results will be NA.

...

additional arguments passed to isBimeraDenovo or isBimeraDenovoTable.

Details

When not all sub-regions/domains for a given read have been successfully denoised with DADA, then the missing regions are constructed using cluster_consensus.

Value

a tibble with column "seq_id" and sample_column (if given), as well as one column for each value of regions and output, representing the sub-regions/domains and the concatenated full region.


brendanf/tzara documentation built on March 11, 2021, 5:40 a.m.