TrimDNA: Trims DNA Sequences to the High Quality Region Between...
In DECIPHER: Tools for curating, analyzing, and manipulating biological sequences

Description Usage Arguments Details Value Author(s) See Also Examples

Aids in trimming DNA sequences to the high quality region between a set of patterns that are potentially present on the left and right sides.

TrimDNA(myDNAStringSet,
        leftPatterns,
        rightPatterns,
        type = "ranges",
        quality = NULL,
        maxDistance = 0.1,
        minOverlap = 5,
        allowInternal = TRUE,
        alpha = 0.1,
        threshold = 0.01,
        maxAverageError = threshold,
        maxAmbiguities = 0.1,
        minWidth = 36,
        verbose = TRUE)

`myDNAStringSet`	A `DNAStringSet` object containing the sequences to be trimmed.
`leftPatterns`	A `DNAStringSet` or character vector of patterns to remove from the left side of `myDNAStringSet`, or `""` to prevent trimming patterns on the left.
`rightPatterns`	A `DNAStringSet` or character vector of patterns to remove from the right side of `myDNAStringSet`, or `""` to prevent trimming patterns on the right.
`type`	Character string indicating the type of results desired. This should be (an abbreviation of) either `"ranges"`, `"sequences"` or `"both"`.
`quality`	Either `NULL` (the default) to skip quality trimming, or a `PhredQuality`, `SolexaQuality`, or `IlluminaQuality` object containing the quality scores corresponding to `myDNAStringSet`.
`maxDistance`	Numeric between zero and one giving the maximum distance of a match from the `leftPatterns` and `rightPatterns` to initiate trimming. For example, `0.1` (the default) would allow up to 10% mismatches between a pattern and sequence.
`minOverlap`	Integer specifying the minimum number of nucleotides the `leftPatterns` and `rightPatterns` must overlap a sequence to initiate trimming.
`allowInternal`	Logical initiating whether to search for the `leftPatterns` and `rightPatterns` within `myDNAStringSet`, or (`FALSE` for) only overlapping the ends.
`alpha`	Numeric between zero and one giving the smoothing parameter for an exponential moving average that is applied to the quality scores before trimming. Higher values result in less smoothing than lower values.
`threshold`	Numeric between zero and one specifying the threshold above which to trim the poor quality regions of the sequence. Higher values allow more sequence to be preserved at the expense of a greater error rate.
`maxAverageError`	Numeric between zero and `threshold` indicating the maximum average error rate of the trimmed region of the sequence. Trimmed sequences with average error rates above `maxAverageError` will be rejected. Note that the expected number of errors in a sequence is equal to the average error rate multiplied by the length of the sequence.
`maxAmbiguities`	Numeric between zero and one giving the maximum fraction of ambiguous (e.g., `"N"`) positions that are tolerated within the trimmed region of the sequence. Trimmed sequences with a greater fraction of ambiguities than `maxAmbiguities` will be rejected.
`minWidth`	Integer giving the minimum number of nucleotides a pattern must overlap the sequence to initiate trimming.
`verbose`	Logical indicating whether to display progress.

After a sequencing run, it is often necessary to trim the resulting sequences to the high quality region located between a set of patterns. TrimDNA works as follows: first left and right patterns are identified within the sequences if allowInternal is TRUE (the default). If the patterns are not found internally, then a search is conducted at the flanking ends for patterns that partially overlap the sequence. The region between the leftPatterns and rightPatterns is then returned, unless quality information is provided. Note that the patterns must be in the same orientation as the sequence, which may require using the reverseComplement of a PCR primer.

If quality contains quality scores, these are converted to error probabilities and an exponential moving average is applied to smooth the signal. The longest region between the leftPatterns and rightPatterns where the average error probability is below threshold is then returned, so long as it has an average error rate of at most maxAverageError. Note that it is possible to only filter by maxAverageError by setting threshold to 1, or vise-versa by setting maxAverageError to threshold.

TrimDNA can return two types of results: IRanges that can be used for trimming myDNAStringSet, or a trimmed DNAStringSet containing only those sequences over minWidth nucleotides after trimming. Note that ambiguity codes (IUPAC_CODE_MAP) are supported in the leftPatterns and rightPatterns, but not in myDNAStringSet to prevent trivial matches (e.g., runs of N's).

If type is "ranges" (the default) the output is an IRanges object with the start, end, and width of every sequence. This information can be accessed with the corresponding accessor function (see examples below). Note that the start will be 1 and the end will be 0 for sequences that were not at least minWidth nucleotides after trimming.

If type is "sequences" then the trimmed sequences are returned that are at least minWidth nucleotides in length.

If type is "both" the output is a list of two components, the first containing the ranges and the second containing the sequences.

Erik Wright eswright@pitt.edu

CorrectFrameshifts

# simple example of trimming a single sequence
dna <- DNAStringSet("AAAAAAAAAATTACTTCCCCCCCCCC")
qscores <- PhredQuality("0000000000AAAAAAAAAAAAAAAA")

x <- TrimDNA(dna,
            leftPatterns="AAAAAA",
            rightPatterns="CCCCCC",
            quality=qscores,
            minWidth=1,
            allowInternal=TRUE,
            type="both")

x[[1]]
start(x[[1]])
end(x[[1]])
width(x[[1]])

x[[2]]

# example of trimming a FASTQ file by quality scores
fpath <- system.file("extdata",
	"s_1_sequence.txt",
	package="Biostrings")
reads <- readDNAStringSet(fpath, "fastq", with.qualities=TRUE)
TrimDNA(reads,
	leftPatterns="",
	rightPatterns="",
	type="sequences",
	quality=PhredQuality(mcols(reads)$qualities))

Loading required package: Biostrings
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, basename, cbind, colnames,
    dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which.max, which.min

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: ‘S4Vectors’

The following object is masked from ‘package:base’:

    expand.grid

Loading required package: IRanges
Loading required package: XVector

Attaching package: ‘Biostrings’

The following object is masked from ‘package:base’:

    strsplit

Loading required package: RSQLite
Finding left pattern: 100% internal, 0% flanking

Finding right pattern: 100% internal, 0% flanking

Trimming by quality score: 100% left, 0% right

Time difference of 0.06 secs

IRanges object with 1 range and 0 metadata columns:
          start       end     width
      <integer> <integer> <integer>
  [1]        15        16         2
[1] 15
[1] 16
[1] 2
DNAStringSet object of length 1:
    width seq
[1]     2 TT
Trimming by quality score: 0% left, 0% right

Time difference of 0.05 secs

DNAStringSet object of length 256:
      width seq                                             names               
  [1]    36 GGACTTTGTAGGATACCCTCGCTTTCCTTCTCCTGT            HWI-EAS88_1_1_1_1...
  [2]    36 GATTTCTTACCTATTAGTGGTTGAACAGCATCGGAC            HWI-EAS88_1_1_1_8...
  [3]    36 GCGGTGGTCTATAGTGTTATTAATATCAATTTGGGT            HWI-EAS88_1_1_1_9...
  [4]    36 GTTACCATGATGTTATTTCTTCATTTGGAGGTAAAA            HWI-EAS88_1_1_1_8...
  [5]    36 GTATGTTTCTCCTGCTTATCACCTTCTTGAAGGCTT            HWI-EAS88_1_1_1_9...
  ...   ... ...
[252]    36 GTTTAGATATGAGTCACATTTTGTTCATGGTAGAGT            HWI-EAS88_1_1_1_6...
[253]    36 GTTTTACAGACACCTAAAGCTACATCGTCAACGTTA            HWI-EAS88_1_1_1_1...
[254]    36 GATGAACTAAGTCAACCTCAGCACTAACCTTGCGAG            HWI-EAS88_1_1_1_3...
[255]    36 GTTTGGTTCGCTTTGAGTCTTCTTCGGTTCCGACTA            HWI-EAS88_1_1_1_8...
[256]    36 GCAATCTGCCGACCACTCGCGATTCAATCATGACTT            HWI-EAS88_1_1_1_8...

DECIPHER documentation built on Nov. 8, 2020, 8:30 p.m.

DECIPHER index

Package overview Classify Sequences Design Group-Specific FISH Probes Design Group-Specific Primers Design Microarray Probes Design Primers That Yield Group-Specific Signatures Finding Chimeric Sequences Getting Started DECIPHERing The Art of Multiple Sequence Alignment in R The Magic of Gene Finding

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

DECIPHER
Tools for curating, analyzing, and manipulating biological sequences

TrimDNA: Trims DNA Sequences to the High Quality Region Between...
In DECIPHER: Tools for curating, analyzing, and manipulating biological sequences

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Example output

Related to TrimDNA in DECIPHER...

R Package Documentation

Browse R Packages

We want your feedback!

DECIPHER Tools for curating, analyzing, and manipulating biological sequences

TrimDNA: Trims DNA Sequences to the High Quality Region Between... In DECIPHER: Tools for curating, analyzing, and manipulating biological sequences

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Example output

Related to TrimDNA in DECIPHER...

R Package Documentation

Browse R Packages

We want your feedback!

DECIPHER
Tools for curating, analyzing, and manipulating biological sequences

TrimDNA: Trims DNA Sequences to the High Quality Region Between...
In DECIPHER: Tools for curating, analyzing, and manipulating biological sequences