dada | R Documentation |
The dada function takes as input dereplicated amplicon sequencing reads and returns the inferred composition of the sample (or samples). Put another way, dada removes all sequencing errors to reveal the members of the sequenced community.
If dada is run in selfConsist=TRUE mode, the algorithm will infer both the sample composition and the parameters of its error model from the data.
dada(
derep,
err,
errorEstimationFunction = loessErrfun,
selfConsist = FALSE,
pool = FALSE,
priors = character(0),
multithread = FALSE,
verbose = TRUE,
...
)
derep |
(Required). |
err |
(Required). 16xN numeric matrix, or an object coercible by The matrix of estimated rates for each possible nucleotide transition (from sample nucleotide to read nucleotide). Rows correspond to the 16 possible transitions (t_ij) indexed such that 1:A->A, 2:A->C, ..., 16:T->T Columns correspond to quality scores. Each entry must be between 0 and 1. Typically there are 41 columns for the quality scores 0-40. However, if USE_QUALS=FALSE, the matrix must have only one column. If selfConsist = TRUE, |
errorEstimationFunction |
(Optional). Function. Default If USE_QUALS = TRUE, If USE_QUALS = FALSE, this argument is ignored, and transition rates are estimated by maximum likelihood (t_ij = n_ij/n_i). |
selfConsist |
(Optional). If selfConsist = TRUE, the algorithm will alternate between sample inference and error rate estimation
until convergence. Error rate estimation is performed by If selfConsist=FALSE the algorithm performs one round of sample inference based on the provided |
pool |
(Optional). If pool = TRUE, the algorithm will pool together all samples prior to sample inference. If pool = FALSE, sample inference is performed on each sample individually. If pool = "pseudo", the algorithm will perform pseudo-pooling between individually processed samples. This argument has no effect if only 1 sample is provided, and |
priors |
(Optional). The priors argument provides a set of sequences for which there is prior information suggesting they may truly exist, i.e. are not errors. The abundance p-value of dereplicated sequences that exactly match one of the priors are calculated without conditioning on presence, allowing singletons to be detected, and are compared to a reduced threshold 'OMEGA_P' when forming new partitions. |
multithread |
(Optional). Default is FALSE.
If TRUE, multithreading is enabled and the number of available threads is automatically determined.
If an integer is provided, the number of threads to use is set by passing the argument on to
|
verbose |
(Optional). Default TRUE. Print verbose text output. More fine-grained control is available by providing an integer argument.
|
... |
(Optional). All dada_opts can be passed in as arguments to the dada() function.
See |
Briefly, dada
implements a statistical test for the notion that a specific sequence was seen too many times
to have been caused by amplicon errors from currently inferred sample sequences. Overly-abundant
sequences are used as the seeds of new partitions of sequencing reads, and the final set of partitions
is taken to represent the denoised composition of the sample. A more detailed explanation of the algorithm
is found in two publications:
Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJ, Holmes SP (2016). DADA2: High resolution sample inference from Illumina amplicon data. Nature Methods, 13(7), 581-3.
Rosen MJ, Callahan BJ, Fisher DS, Holmes SP (2012). Denoising PCR-amplified metagenome data. BMC bioinformatics, 13(1), 283.
dada
depends on a parametric error model of substitutions. Thus the quality of its sample inference is affected
by the accuracy of the estimated error rates. selfConsist
mode allows these error rates to be inferred
from the data.
All comparisons between sequences performed by dada
depend on pairwise alignments. This step is the most
computationally intensive part of the algorithm, and two alignment heuristics have been implemented for speed:
A kmer-distance screen and banded Needleman-Wunsch alignmemt. See setDadaOpt
.
A dada-class
object or list of such objects if a list of dereps was provided.
derepFastq
, setDadaOpt
fn1 <- system.file("extdata", "sam1F.fastq.gz", package="dada2")
fn2 <- system.file("extdata", "sam2F.fastq.gz", package="dada2")
derep1 = derepFastq(fn1)
derep2 = derepFastq(fn2)
dada(fn1, err=tperr1)
dada(list(sam1=derep1, sam2=derep2), err=tperr1, selfConsist=TRUE)
dada(derep1, err=inflateErr(tperr1,3), BAND_SIZE=32, OMEGA_A=1e-20)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.