removePrimers: Removes primers and orients reads in a consistent direction.

View source: R/filter.R

removePrimersR Documentation

Removes primers and orients reads in a consistent direction.

Description

Removes primer(s) and orients the reads in input fastq file(s) (can be compressed). Reads that do not contain the primer(s) are discarded. Intended for use with PacBio CCS data. Faster external solutions such as cutadapt or trimmomatic are recommended for short-read data.

Usage

removePrimers(
  fn,
  fout,
  primer.fwd,
  primer.rev = NULL,
  max.mismatch = 2,
  allow.indels = FALSE,
  trim.fwd = TRUE,
  trim.rev = TRUE,
  orient = TRUE,
  compress = TRUE,
  verbose = FALSE
)

Arguments

fn

(Required). character. The path(s) to the input fastq file(s). Can be compressed.

fout

(Required). character. The path(s) to the output fastq file(s) corresponding to the fwd input files. If directory containing the file does not exist, it will be created. Output files are gzip compressed by default.

primer.fwd

(Required). character. The forward primer sequence expected to be at the beginning of the sequenced amplicon. Can contain IUPAC ambiguous nucleotide codes.

primer.rev

(Optional). Default NULL. The reverse primer sequence expected to be at the end of the sequenced amplicon. Can contain IUPAC ambiguous nucleotide codes. NOTE: 'primer.rev' should be provided in the orientation that would appear in a DNA sequence starting at the forward primer and being read towards the reverse primer. Thus, it is often necessary to reverse-complement the reverse primer sequence before providing it to this function.

max.mismatch

(Optional). Default 2. The number of mismatches to tolerate when matching reads to primer sequences. See vmatchPattern for details.

allow.indels

(Optional). Default FALSE. If TRUE, indels ared allowed when matching the primer sequences to the read. If FALSE, no indels are allowed. Note that when 'allow.indels=TRUE', primer matching is significantly slower, currently about 4x slower.

trim.fwd

(Optional). Default TRUE. If TRUE, reads are trimmed to the end of the forward primer, i.e. the forward primer and any preceding sequence are trimmed off.

trim.rev

(Optional). Default TRUE. If TRUE, reads are trimmed to the beginning of the reverse primer, i.e. the reverse primer and any subsequent sequence are trimmed off.

orient

(Optional). Default TRUE. If TRUE, reads are re-oriented if the reverse complement of the read is a better match to the provided primer sequence(s). This is recommended for PacBio CCS reads, which come in a random mix of forward and reverse-complement orientations.

compress

(Optional). Default TRUE. If TRUE, the output fastq file(s) are gzipped.

verbose

(Optional). Default FALSE. Whether to output status messages.

Value

Integer matrix. Returned invisibly (i.e. only if assigned to something). Rows correspond to the input files, columns record the number of reads.in and reads.out after discarding reads that didn't match the provided primers.

Examples

F27 <- "AGRGTTYGATYMTGGCTCAG"
R1492 <- "RGYTACCTTGTTACGACTT"
fn <- system.file("extdata", "samPBprimers.fastq.gz", package="dada2")
fn.noprime <- tempfile(fileext=".fastq.gz")
removePrimers(fn, fn.noprime, primer.fwd=F27, primer.rev=rc(R1492), orient=TRUE, verbose=TRUE)


benjjneb/dada2 documentation built on Dec. 5, 2024, 4:02 p.m.