setDadaOpt: Set DADA options

Description Usage Arguments Details Value See Also Examples

View source: R/dada.R


setDadaOpt sets the default options used by the dada(...) function for your current session, much like par sets the session default plotting parameters. However, all dada options can be set as part of the dada(...) function call itself by including a DADA_OPTION_NAME=VALUE argument.





(Required). The DADA options to set, along with their new value.



OMEGA_A: This parameter sets the threshold for when DADA2 calls unique sequences significantly overabundant, and therefore creates a new partition with that sequence as the center. Default is 1e-40, which is a conservative setting to avoid making false positive inferences, but which comes at the cost of reducing the ability to identify some rare variants.

OMEGA_P: The threshold for unique sequences with prior evidence of existence (see 'priors' argument). Default is 1e-4.

OMEGA_C: The threshold at which unique sequences inferred to contain errors are corrected in the final output. The probability that each unique sequence is generated at its observed abundance from the center of its final partition is evaluated, and compared to OMEGA_C. If that probability is >= OMEGA_C, it is "corrected", i.e. replaced by the partition center sequence. The special value of 0 corresponds to correcting all input sequences, and any value > 1 corresponds to performing no correction on sequences found to contain errors. Default is 1e-40 (same as OMEGA_A).


MATCH: The score of a match in the Needleman-Wunsch alignment. Default is 4.

MISMATCH: The score of a mismatch in the Needleman-Wunsch alignment. Default is -5.

GAP_PENALTY: The cost of gaps in the Needleman-Wunsch alignment. Default is -8.

HOMOPOLYMER_GAP_PENALTY: The cost of gaps in homopolymer regions (>=3 repeated bases). Default is NULL, which causes homopolymer gaps to be treated as normal gaps.

BAND_SIZE: When set, banded Needleman-Wunsch alignments are performed. Banding restricts the net cumulative number of insertion of one sequence relative to the other. The default value of BAND_SIZE is 16. If DADA is applied to sequencing technologies with high rates of indels, such as 454 sequencing, the BAND_SIZE parameter should be increased. Setting BAND_SIZE to a negative number turns off banding (i.e. full Needleman-Wunsch).

**Sequence Comparison Heuristics**

USE_KMERS: If TRUE, a 5-mer distance screen is performed prior to performing each pairwise alignment, and if the 5mer-distance is greater than KDIST_CUTOFF, no alignment is performed. Default is TRUE.

KDIST_CUTOFF: The default value of 0.42 was chosen to screen pairs of sequences that differ by >10%, and was calibrated on Illumina sequenced 16S amplicon data. The assumption is that sequences that differ by such a large amount cannot be linked by amplicon errors (i.e. if you sequence one, you won't get a read of other) and so careful (and costly) alignment is unnecessary.

GAPLESS: If TRUE, the ordered kmer identity between pairs of sequences is compared to their unordered overlap. If equal, the optimal alignment is assumed to be gapless. Default is TRUE. Only relevant if USE_KMERS is TRUE.

GREEDY: The DADA2 algorithm is not greedy, but a very restricted form of greediness can be turned on via this option. If TRUE, unique sequences with reads less than those expected to be generated by resequencing just the central unique in their partition are "locked" to that partition. Modest (~30

**New Partition Conditions**

MIN_FOLD: The minimum fold-overabundance for sequences to form new partitions. Default value is 1, which means this criteria is ignored.

MIN_HAMMING: The minimum hamming-separation for sequences to form new partitions. Default value is 1, which means this criteria is ignored.

MIN_ABUNDANCE: The minimum abundance for unique sequences form new partitions. Default value is 1, which means this criteria is ignored.

MAX_CLUST: The maximum number of partitions. Once this many partitions have been created, the algorithm terminates regardless of whether the statistical model suggests more real sequence variants exist. If set to 0 this argument is ignored. Default value is 0.

**Self Consistency**

MAX_CONSIST: The maximum number of steps when selfConsist=TRUE. If convergence is not reached in MAX_CONSIST steps, the algorithm will terminate with a warning message. Default value is 10.

**Pseudo-pooling Behavior**

PSEUDO_PREVALENCE: When performing pseudo-pooling, all sequence variants found in at least two samples are used as priors for a subsequent round of sample inference. Only relevant if 'pool="pseudo"'. Default is 2.

PSEUDO_ABUNDANCE: When performing pseudo-pooling, all denoised sequence variants with total abundance (over all samples) greater than this are used as priors for a subsequent round of sample inference. Only relevant if 'pool="pseudo"'. Default is Inf (i.e. abundance ignored for this purpose).

**Error Model**

USE_QUALS: If TRUE, the dada(...) error model takes into account the consensus quality score of the dereplicated unique sequences. If FALSE, quality scores are ignored. Default is TRUE.


SSE: Controls the level of explicit SSE vectorization for kmer calculations. Default 2. Maintained for development reasons, should have no impact on output.



See Also



setDadaOpt(OMEGA_A = 1e-20)

dada2 documentation built on May 2, 2018, 4:25 a.m.