Trim ends of reads based on nucleotides or qualities

Share:

Description

These generic functions remove leading or trailing nucleotides or qualities. trimTails and trimTailw remove low-quality reads from the right end using a sliding window (trimTailw) or a tally of (successive) nucleotides falling at or below a quality threshold (trimTails). trimEnds takes an alphabet of characters to remove from either left or right end.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
## S4 methods for 'ShortReadQ', 'FastqQuality', or 'SFastqQuality'
trimTailw(object, k, a, halfwidth, ..., ranges=FALSE)
trimTails(object, k, a, successive=FALSE, ..., ranges=FALSE)
trimEnds(object, a, left=TRUE, right=TRUE, relation=c("<=", "=="),
    ..., ranges=FALSE)

## S4 method for signature 'BStringSet'
trimTailw(object, k, a, halfwidth, ..., alphabet, ranges=FALSE)
## S4 method for signature 'BStringSet'
trimTails(object, k, a, successive=FALSE, ...,
    alphabet, ranges=FALSE)

## S4 method for signature 'character'
trimTailw(object, k, a, halfwidth, ..., destinations, ranges=FALSE)
## S4 method for signature 'character'
trimTails(object, k, a, successive=FALSE, ..., destinations, ranges=FALSE)
## S4 method for signature 'character'
trimEnds(object, a, left=TRUE, right=TRUE, relation=c("<=", "=="),
    ..., destinations, ranges=FALSE)

Arguments

object

An object (e.g., ShortReadQ and derived classes; see below to discover these methods) or character vector of fastq file(s) to be trimmed.

k

integer(1) describing the number of failing letters required to trigger trimming.

a

For trimTails and trimTailw, a character(1) with nchar(a) == 1L giving the letter at or below which a nucleotide is marked as failing.

For trimEnds a character() with all nchar() == 1L giving the letter at or below which a nucleotide or quality scores marked for removal.

halfwidth

The half width (cycles before or after the current; e.g., a half-width of 5 would span 5 + 1 + 5 cycles) in which qualities are assessed.

successive

logical(1) indicating whether failures can occur anywhere in the sequence, or must be successive. If successive=FALSE, then the k'th failed letter and subsequent are removed. If successive=TRUE, the first succession of k failed and subsequent letters are removed.

left, right

logical(1) indicating whether trimming is from the left or right ends.

relation

character(1) selected from the argument values, i.e., “<=” or “==” indicating whether all letters at or below the alphabet(object) are to be removed, or only exact matches.

...

Additional arguments, perhaps used by methods.

destinations

For object of type character(), an equal-length vector of destination files. Files must not already exist.

alphabet

character() (ordered low to high) letters on which quality scale is measured. Usually supplied internally (user does not need to specify). If missing, then set to ASCII characters 0-127.

ranges

logical(1) indicating whether the trimmed object, or only the ranges satisfying the trimming condition, be returned.

Details

trimTailw starts at the left-most nucleotide, tabulating the number of cycles in a window of 2 * halfwidth + 1 surrounding the current nucleotide with quality scores that fall at or below a. The read is trimmed at the first nucleotide for which this number >= k. The quality of the first or last nucleotide is used to represent portions of the window that extend beyond the sequence.

trimTails starts at the left-most nucleotide and accumulates cycles for which the quality score is at or below a. The read is trimmed at the first location where this number >= k. With successive=TRUE, failing qualities must occur in strict succession.

trimEnds examines the left, right, or both ends of object, marking for removal letters that correspond to a and relation. The trimEnds,ShortReadQ-method trims based on quality.

ShortReadQ methods operate on quality scores; use sread() and the ranges argument to trim based on nucleotide (see examples).

character methods transform one or several fastq files to new fastq files, applying trim operations based on quality scores; use filterFastq with your own filter argument to filter on nucleotides.

Value

An instance of class(object) trimmed to contain only those nucleotides satisfying the trim criterion or, if ranges=TRUE an IRanges instance defining the ranges that would trim object.

Note

The trim* functions use OpenMP threads (when available) during creation of the return value. This may sometimes create problems when a process is already running on multiple threads, e.g., with an error message like

1
2
    libgomp: Thread creation failed: Resource temporarily unavailable
  

A solution is to precede problematic code with the following code snippet, to disable threading

1
2
3
    nthreads <- .Call(ShortRead:::.set_omp_threads, 1L)
    on.exit(.Call(ShortRead:::.set_omp_threads, nthreads))
  

Author(s)

Martin Morgan <mtmorgan@fhcrc.org>

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
showMethods(trimTails)

sp <- SolexaPath(system.file('extdata', package='ShortRead'))
rfq <- readFastq(analysisPath(sp), pattern="s_1_sequence.txt")

## remove leading / trailing quality scores <= 'I'
trimEnds(rfq, "I")
## remove leading / trailing 'N's
rng <- trimEnds(sread(rfq), "N", relation="==", ranges=TRUE)
narrow(rfq, start(rng), end(rng))
## remove leading / trailing 'G's or 'C's
trimEnds(rfq, c("G", "C"), relation="==")

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.