reconstruct: Reconstruct a longer region out of ASVs or consensus sequence...
In brendanf/tzara: Cluster long amplicons using dada2 denoising on variable regions

Description Usage Arguments Details Value

The sequences from each denoised sub-region/domain are concatenated to create a denoised sequence for the long region. Additionally, de-novo bimera detection is performed using isBimeraDenovo or isBimeraDenovoTable on sets of three consecutive sub-regions/domains; in the intended application, these sets will be variable–conserved–variable.

reconstruct(
  seqtabs,
  regions = names(seqtabs),
  regions_regex = NULL,
  regions_replace = NULL,
  output = "concat",
  use_output = c("first", "second", "no"),
  order = regions,
  read_column = "seq_id",
  asv_column = "dada_seq",
  rawtabs = seqtabs,
  raw_column = NULL,
  raw_regions = names(rawtabs),
  sample_column = NULL,
  sample_regex = NULL,
  sample_replace = NULL,
  chimera_offset = 0,
  allow_map = TRUE,
  allow_consensus = TRUE,
  allow_raw = FALSE,
  ...
)

`seqtabs`	(`list` of `data.frame`) with columns `read_column`, `asv_column`, and optionally `sample_column`. Any additional columns are ignored. `read_column` should give a unique ID for each sequencing read, and `asv_column` should give the denoised sequence for the read.
`regions`	(`character` vector with the same length as `seqtabs`) The names of the regions/domains represented by each of the tables in `seqtabs`. If not supplied, then `seqtabs` should be named by the regions.
`regions_regex`	(`character` scalar, or `NULL`) A regular expression. If `regions_regex` is given but `regions_replace` is not, then only the part of the entries in `regions` matching the regex are used to define samples (using `str_extract`). If `regions_replace` is also used, then the regex is instead replaced by `regions_replace` (using `str_replace`). `NA_character` is treated the same way as `NULL`.
`regions_replace`	(`character` scalar, or `NULL`) Replacement string for `regions_regex`. `NA_character` is treated the same way as `NULL`.
`output`	(`character` scalar or named list of `character` vectors) If a `character` scalar, then the name to be used for the (single) output region. In this case the region will be the concatenation of all the regions in `order`. Alternatively, a list where the names are the names of the output regions, and the values are `character` vectors giving the regions which should be concatenated for each output region.
`use_output`	(one of `"first"`, `"second"`, or `"no"`) If one of the regions given by `output` is also present in `seqtabs`, then the `seqtabs` version is used preferentially `use_output == "first"`, as a backup value when one of the subregions/domains is missing if `use_output == "second"`, or not at all if `use_output == "no"`.
`order`	(`character` vector) The order in which the sub-regions/domains should be concatenated to produce the output(s).
`read_column`	(`character` scalar) Column name from the `seqtabs` which uniquely identifies each read (but different regions extracted from the same read should have the same ID.)
`asv_column`	(`character` scalar) Column name from the `seqtabs` which gives the denoised sequences.
`rawtabs`	(`list` of `data.frame`) Data sources of the same format as `seqtabs`, with columns `read_column` and `raw_column`. These should be of the same number as `seqtabs`, and correspond to the sub-regions/domains specified in `regions`. The default is to look for `raw_column` in `seqtabs`.
`raw_column`	(`character` scalar, or `NULL`) Column name from the `seqtabs` which gives the raw sequences. If `NULL` or `NA_character_`, then consensus sequences will not be used as a backup when no denoised sequence is present.
`raw_regions`	(`character` vector with the same length as `rawtabs`) The names of the regions/domains represented by each of the tables in `rawtabs`. These will be processed using `regions_regex` and `regions_replace`, if given.
`sample_column`	(`character` scalar, or `NULL`) An optional column name from the `seqtabs` which identifies which sample each sequence is from. If given, this is used (after possible modification by `sample_regex` and `sample_replace`) to identify different samples for `isBimeraDenovoTable`. `NA_character` is treated the same way as `NULL`.
`sample_regex`	(`character` scalar, or `NULL`) A regular expression. If `sample_regex` is given but `sample_replace` is not, then only the part of the entries in `sample_column` matching the regex are used to define samples (using `str_extract`). If `sample_replace` is also used, then the regex is instead replaced by `sample_replace` (using `str_replace`). `NA_character` is treated the same way as `NULL`.
`sample_replace`	(`character` scalar, or `NULL`) Replacement string for `sample_regex`. `NA_character` is treated the same way as `NULL`.
`chimera_offset`	(`integer`) By default, bimeras are checked for sub-region/domains 1, 2, 3; 3, 4, 5; 5, 6, 7; etc. This is appropriate if the domains alternate variable, conserved, variable, etc. If a more conserved domain is first, use `chimera_offset = 1`.
`allow_map`	(`logical` scalar) If `TRUE` and if `asvs` contains non-missing values, attempt to map each raw read without a corresponding ASV to the nearest ASV.
`allow_consensus`	(`logical` scalar) If `TRUE` and if `allow_map` is `FALSE` or there are no non-missing values in `asvs`, then attempt to make a consensus of all raw reads.
`allow_raw`	(`logical` scalar) If `TRUE`, then after mapping and/or consensus building, remaining raw reads are taken as they are. If `FALSE`, the corresponding results will be `NA`.
`...`	additional arguments passed to `isBimeraDenovo` or `isBimeraDenovoTable`.

When not all sub-regions/domains for a given read have been successfully denoised with DADA, then the missing regions are constructed using cluster_consensus.

a tibble with column "seq_id" and sample_column (if given), as well as one column for each value of regions and output, representing the sub-regions/domains and the concatenated full region.

brendanf/tzara documentation built on March 11, 2021, 5:40 a.m.