isBimeraDenovoTable: Identify bimeras in a sequence table.

View source: R/chimeras.R

isBimeraDenovoTableR Documentation

Identify bimeras in a sequence table.

Description

This function implements a table-specific version of de novo bimera detection. In short, bimeric sequences are flagged on a sample-by-sample basis. Then, a vote is performed for each sequence across all samples in which it appeared. If the sequence is flagged in a sufficiently high fraction of samples, it is identified as a bimera. A logical vector is returned, with an entry for each sequence in the table indicating whether it was identified as bimeric by this consensus procedure.

Usage

isBimeraDenovoTable(
  seqtab,
  minSampleFraction = 0.9,
  ignoreNNegatives = 1,
  minFoldParentOverAbundance = 1.5,
  minParentAbundance = 2,
  allowOneOff = FALSE,
  minOneOffParentDistance = 4,
  maxShift = 16,
  multithread = FALSE,
  verbose = FALSE
)

Arguments

seqtab

(Required). A sequence table. That is, an integer matrix with colnames corresponding to DNA sequences.

minSampleFraction

(Optional). Default is 0.9. The fraction of samples in which a sequence must be flagged as bimeric in order for it to be classified as a bimera.

ignoreNNegatives

(Optional). Default is 1. The number of unflagged samples to ignore when evaluating whether the fraction of samples in which a sequence was flagged as a bimera exceeds minSampleFraction. The purpose of this parameter is to lower the threshold at which sequences found in few samples are flagged as bimeras.

minFoldParentOverAbundance

(Optional). Default is 1.5. Only sequences greater than this-fold more abundant than a sequence can be its "parents". Evaluated on a per-sample basis.

minParentAbundance

(Optional). Default is 2. Only sequences at least this abundant can be "parents". Evaluated on a per-sample basis.

allowOneOff

(Optional). Default is FALSE. If FALSE, sequences that have one mismatch or indel to an exact bimera are also flagged as bimeric.

minOneOffParentDistance

(Optional). Default is 4. Only sequences with at least this many mismatches to the potential bimeric sequence considered as possible "parents" when flagging one-off bimeras. There is no such screen when considering exact bimeras.

maxShift

(Optional). Default is 16. Maximum shift allowed when aligning sequences to potential "parents".

multithread

(Optional). Default is FALSE. If TRUE, multithreading is enabled. NOT YET IMPLEMENTED.

verbose

(Optional). Default FALSE. Print verbose text output.

Value

logical of length equal to the number of sequences in the input table. TRUE if sequence is identified as a bimera. Otherwise FALSE.

See Also

isBimera, removeBimeraDenovo

Examples

derep1 = derepFastq(system.file("extdata", "sam1F.fastq.gz", package="dada2"))
derep2 = derepFastq(system.file("extdata", "sam2F.fastq.gz", package="dada2"))
dd <- dada(list(derep1,derep2), err=NULL, errorEstimationFunction=loessErrfun, selfConsist=TRUE)
seqtab <- makeSequenceTable(dd)
isBimeraDenovoTable(seqtab)
isBimeraDenovoTable(seqtab, allowOneOff=TRUE, minSampleFraction=0.5)


benjjneb/dada2 documentation built on Feb. 1, 2024, 10:50 p.m.