derepFastq: Read in and dereplicate a fastq file.

Description Usage Arguments Value Examples

View source: R/sequenceIO.R

Description

A custom interface to FastqStreamer for dereplicating amplicon sequences from fastq or compressed fastq files, while also controlling peak memory requirement to support large files.

Usage

1
derepFastq(fls, n = 1e+06, verbose = FALSE, qualityType = "Auto")

Arguments

fls

(Required). character. The file path(s) to the fastq file(s), or a directory containing fastq file(s). Compressed file formats such as .fastq.gz and .fastq.bz2 are supported.

n

(Optional). numeric(1). The maximum number of records (reads) to parse and dereplicate at any one time. This controls the peak memory requirement so that large fastq files are supported. Default is 1e6, one-million reads. See FastqStreamer for details on this parameter, which is passed on.

verbose

(Optional). Default FALSE. If TRUE, throw standard R messages on the intermittent and final status of the dereplication.

qualityType

(Optional). character(1). The quality encoding of the fastq file(s). "Auto" (the default) means to attempt to auto-detect the encoding. This may fail for PacBio files with uniformly high quality scores, in which case use "FastqQuality". This parameter is passed on to readFastq; see information there for details.

Value

A derep-class object or list of such objects.

Examples

1
2
3
4
5
# Test that chunk-size, `n`, does not affect the result.
testFastq = system.file("extdata", "sam1F.fastq.gz", package="dada2")
derep1 = derepFastq(testFastq, verbose = TRUE)
derep1.35 = derepFastq(testFastq, n = 35, verbose = TRUE)
all.equal(getUniques(derep1), getUniques(derep1.35)[names(getUniques(derep1))])

dada2 documentation built on Nov. 8, 2020, 6:48 p.m.