qfilter: Quality filtering for amplicon sequences.

Description Usage Arguments Value Author(s) Examples

View source: R/filter.R

Description

This function performs several quality checks for FASTQ input files, removing any sequences that do not conform to the specified quality standards. This includes an average quality score assessment, size selection, singleton removal (or an alternative minimum count) and ambiguous base-call filtering.

Usage

1
2
3
4
5
6
7
8
qfilter(
  x,
  minqual = 30,
  maxambigs = 0,
  mincount = 2,
  minlength = 50,
  maxlength = 500
)

Arguments

x

a vector of concatenated strings representing DNA sequences (in upper case) or a DNAbin list object with quality attributes. This argument will usually be produced by readFASTQ.

minqual

integer, the minimum average quality score for a sequence to pass the filter. Defaults to 30.

maxambigs

integer, the maximum number of ambiguities for a sequence to pass the filter. Defaults to 0.

mincount

integer, the minimum acceptable number of occurrences of a sequence for it to pass the filter. Defaults to 2 (removes singletons).

minlength

integer, the minimum acceptable sequence length. Defaults to 50.

maxlength

integer, the maximum acceptable sequence length. Defaults to 500.

Value

an object of the same type as the primary input argument (i.e. a "DNAbin" object if x is a "DNAbin" object, or a vector of concatenated character strings otherwise).

Author(s)

Shaun Wilkinson

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
  ## download and extract example FASTQ file to temporary directory
  td <- tempdir()
  URL <- "https://www.dropbox.com/s/71ixehy8e51etdd/insect_tutorial1_files.zip?dl=1"
  dest <- paste0(td, "/insect_tutorial1_files.zip")
  download.file(URL, destfile = dest, mode = "wb")
  unzip(dest, exdir = td)
  x <- readFASTQ(paste0(td, "/COI_sample2.fastq"))
  ## trim primers from sequences
  mlCOIintF <- "GGWACWGGWTGAACWGTWTAYCCYCC"
  jgHCO2198 <- "TAIACYTCIGGRTGICCRAARAAYCA"
  x <- trim(x, up = mlCOIintF, down = jgHCO2198)
  ## filter sequences to remove singletons, low quality & short/long reads
  x <- qfilter(x, minlength = 250, maxlength = 350)
 

shaunpwilkinson/insect documentation built on Aug. 9, 2021, 5 a.m.