fastqFilter: Filter and trim a fastq file.
In benjjneb/dada2: Accurate, high-resolution sample inference from amplicon sequencing data

fastqFilter

R Documentation

Filter and trim a fastq file.

Description

fastqFilter takes an input fastq file (can be compressed), filters it based on several user-definable criteria, and outputs those reads which pass the filter to a new fastq file (also can be compressed). Several functions in the ShortRead package are leveraged to do this filtering.

Usage

fastqFilter(
  fn,
  fout,
  truncQ = 2,
  truncLen = 0,
  maxLen = Inf,
  minLen = 20,
  trimLeft = 0,
  trimRight = 0,
  maxN = 0,
  minQ = 0,
  maxEE = Inf,
  rm.phix = TRUE,
  rm.lowcomplex = 0,
  orient.fwd = NULL,
  n = 1e+06,
  OMP = TRUE,
  qualityType = "Auto",
  compress = TRUE,
  verbose = FALSE,
  ...
)

Arguments

`fn`	(Required). The path to the input fastq file.
`fout`	(Required). The path to the output file. Note that by default (`compress=TRUE`) the output fastq file is gzipped.
`truncQ`	(Optional). Default 2. Truncate reads at the first instance of a quality score less than or equal to `truncQ`.
`truncLen`	(Optional). Default 0 (no truncation). Truncate reads after `truncLen` bases. Reads shorter than this are discarded.
`maxLen`	(Optional). Default Inf (no maximum). Remove reads with length greater than maxLen. maxLen is enforced on the raw reads.
`minLen`	(Optional). Default 20. Remove reads with length less than minLen. minLen is enforced after all other trimming and truncation.
`trimLeft`	(Optional). Default 0. The number of nucleotides to remove from the start of each read. If both `truncLen` and `trimLeft` are provided, filtered reads will have length `truncLen-trimLeft`.
`trimRight`	(Optional). Default 0. The number of nucleotides to remove from the end of each read. If both `truncLen` and `trimRight` are provided, truncation will be performed after `trimRight` is enforced.
`maxN`	(Optional). Default 0. After truncation, sequences with more than `maxN` Ns will be discarded. Note that `dada` currently does not allow Ns.
`minQ`	(Optional). Default 0. After truncation, reads contain a quality score below minQ will be discarded.
`maxEE`	(Optional). Default `Inf` (no EE filtering). After truncation, reads with higher than maxEE "expected errors" will be discarded. Expected errors are calculated from the nominal definition of the quality score: EE = sum(10^(-Q/10))
`rm.phix`	(Optional). Default TRUE. If TRUE, discard reads that match against the phiX genome, as determined by `isPhiX`.
`rm.lowcomplex`	(Optional). Default 0. If greater than 0, reads with an effective number of kmers less than this value will be removed. The effective number of kmers is determined by `seqComplexity` using a Shannon information approximation. The default kmer-size is 2, and therefore perfectly random sequences will approach an effective kmer number of 16 = 4 (nucleotides) ^ 2 (kmer size).
`orient.fwd`	(Optional). Default NULL. A character string present at the start of valid reads. Only allows unambiguous nucleotides. This string is compared to the start of each read, and the reverse complement of each read. If it exactly matches the start of the read, the read is kept. If it exactly matches the start of the reverse-complement read, the read is reverse-complemented and kept. Otherwise the read if filtered out. The primary use of this parameter is to unify the orientation of amplicon sequencing libraries that are a mixture of forward and reverse orientations, and that include the forward primer on the reads.
`n`	(Optional). The number of records (reads) to read in and filter at any one time. This controls the peak memory requirement so that very large fastq files are supported. Default is `1e6`, one-million reads. See `FastqStreamer` for details.
`OMP`	(Optional). Default TRUE. Whether or not to use OMP multithreading when calling `FastqStreamer`. Set this to FALSE if calling this function within a parallelized chunk of code (eg. within `mclapply`).
`qualityType`	(Optional). `character(1)`. The quality encoding of the fastq file(s). "Auto" (the default) means to attempt to auto-detect the encoding. This may fail for PacBio files with uniformly high quality scores, in which case use "FastqQuality". This parameter is passed on to `readFastq`; see information there for details.
`compress`	(Optional). Default TRUE. Whether the output fastq file should be gzip compressed.
`verbose`	(Optional). Default FALSE. Whether to output status messages.
`...`	(Optional). Arguments passed on to `isPhiX`.

Value

integer(2). The number of reads read in, and the number of reads that passed the filter and were output.

Examples

testFastq = system.file("extdata", "sam1F.fastq.gz", package="dada2")
filtFastq <- tempfile(fileext=".fastq.gz")
fastqFilter(testFastq, filtFastq, maxN=0, maxEE=2)
fastqFilter(testFastq, filtFastq, trimLeft=10, truncLen=200, maxEE=2, verbose=TRUE)

benjjneb/dada2 documentation built on June 10, 2025, 10:43 p.m.

benjjneb/dada2 index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

benjjneb/dada2
Accurate, high-resolution sample inference from amplicon sequencing data

fastqFilter: Filter and trim a fastq file.
In benjjneb/dada2: Accurate, high-resolution sample inference from amplicon sequencing data

Filter and trim a fastq file.

Description

Usage

Arguments

Value

See Also

Examples

Related to fastqFilter in benjjneb/dada2...

R Package Documentation

Browse R Packages

We want your feedback!

benjjneb/dada2 Accurate, high-resolution sample inference from amplicon sequencing data

fastqFilter: Filter and trim a fastq file. In benjjneb/dada2: Accurate, high-resolution sample inference from amplicon sequencing data

Filter and trim a fastq file.

Description

Usage

Arguments

Value

See Also

Examples

Related to fastqFilter in benjjneb/dada2...

R Package Documentation

Browse R Packages

We want your feedback!

benjjneb/dada2
Accurate, high-resolution sample inference from amplicon sequencing data

fastqFilter: Filter and trim a fastq file.
In benjjneb/dada2: Accurate, high-resolution sample inference from amplicon sequencing data