Description Usage Arguments Details Value Author(s) See Also Examples
Aids in trimming DNA sequences to the high quality region between a set of patterns that are potentially present on the left and right sides.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
myDNAStringSet |
A |
leftPatterns |
A |
rightPatterns |
A |
type |
Character string indicating the type of results desired. This should be (an abbreviation of) either |
quality |
Either |
maxDistance |
Numeric between zero and one giving the maximum distance of a match from the |
minOverlap |
Integer specifying the minimum number of nucleotides the |
allowInternal |
Logical initiating whether to search for the |
alpha |
Numeric between zero and one giving the smoothing parameter for an exponential moving average that is applied to the quality scores before trimming. Higher values result in less smoothing than lower values. |
threshold |
Numeric between zero and one specifying the threshold above which to trim the poor quality regions of the sequence. Higher values allow more sequence to be preserved at the expense of a greater error rate. |
maxAverageError |
Numeric between zero and |
maxAmbiguities |
Numeric between zero and one giving the maximum fraction of ambiguous (e.g., |
minWidth |
Integer giving the minimum number of nucleotides a pattern must overlap the sequence to initiate trimming. |
verbose |
Logical indicating whether to display progress. |
After a sequencing run, it is often necessary to trim the resulting sequences to the high quality region located between a set of patterns. TrimDNA
works as follows: first left and right patterns are identified within the sequences if allowInternal
is TRUE
(the default). If the patterns are not found internally, then a search is conducted at the flanking ends for patterns that partially overlap the sequence. The region between the leftPatterns
and rightPatterns
is then returned, unless quality information is provided. Note that the patterns must be in the same orientation as the sequence, which may require using the reverseComplement
of a PCR primer.
If quality
contains quality scores, these are converted to error probabilities and an exponential moving average is applied to smooth the signal. The longest region between the leftPatterns
and rightPatterns
where the average error probability is below threshold
is then returned, so long as it has an average error rate of at most maxAverageError
. Note that it is possible to only filter by maxAverageError
by setting threshold
to 1
, or vise-versa by setting maxAverageError
to threshold
.
TrimDNA
can return two type
s of results: IRanges
that can be used for trimming myDNAStringSet
, or a trimmed DNAStringSet
containing only those sequences over minWidth
nucleotides after trimming. Note that ambiguity codes (IUPAC_CODE_MAP
) are supported in the leftPatterns
and rightPatterns
, but not in myDNAStringSet
to prevent trivial matches (e.g., runs of N's).
If type
is "ranges"
(the default) the output is an IRanges
object with the start, end, and width of every sequence. This information can be accessed with the corresponding accessor function (see examples below). Note that the start will be 1
and the end will be 0
for sequences that were not at least minWidth
nucleotides after trimming.
If type
is "sequences"
then the trimmed sequences are returned that are at least minWidth
nucleotides in length.
If type
is "both"
the output is a list of two components, the first containing the ranges and the second containing the sequences.
Erik Wright eswright@pitt.edu
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | # simple example of trimming a single sequence
dna <- DNAStringSet("AAAAAAAAAATTACTTCCCCCCCCCC")
qscores <- PhredQuality("0000000000AAAAAAAAAAAAAAAA")
x <- TrimDNA(dna,
leftPatterns="AAAAAA",
rightPatterns="CCCCCC",
quality=qscores,
minWidth=1,
allowInternal=TRUE,
type="both")
x[[1]]
start(x[[1]])
end(x[[1]])
width(x[[1]])
x[[2]]
# example of trimming a FASTQ file by quality scores
fpath <- system.file("extdata",
"s_1_sequence.txt",
package="Biostrings")
reads <- readDNAStringSet(fpath, "fastq", with.qualities=TRUE)
TrimDNA(reads,
leftPatterns="",
rightPatterns="",
type="sequences",
quality=PhredQuality(mcols(reads)$qualities))
|
Loading required package: Biostrings
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: ‘BiocGenerics’
The following objects are masked from ‘package:parallel’:
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from ‘package:stats’:
IQR, mad, sd, var, xtabs
The following objects are masked from ‘package:base’:
anyDuplicated, append, as.data.frame, basename, cbind, colnames,
dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
union, unique, unsplit, which.max, which.min
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: ‘S4Vectors’
The following object is masked from ‘package:base’:
expand.grid
Loading required package: IRanges
Loading required package: XVector
Attaching package: ‘Biostrings’
The following object is masked from ‘package:base’:
strsplit
Loading required package: RSQLite
Finding left pattern: 100% internal, 0% flanking
Finding right pattern: 100% internal, 0% flanking
Trimming by quality score: 100% left, 0% right
Time difference of 0.06 secs
IRanges object with 1 range and 0 metadata columns:
start end width
<integer> <integer> <integer>
[1] 15 16 2
[1] 15
[1] 16
[1] 2
DNAStringSet object of length 1:
width seq
[1] 2 TT
Trimming by quality score: 0% left, 0% right
Time difference of 0.05 secs
DNAStringSet object of length 256:
width seq names
[1] 36 GGACTTTGTAGGATACCCTCGCTTTCCTTCTCCTGT HWI-EAS88_1_1_1_1...
[2] 36 GATTTCTTACCTATTAGTGGTTGAACAGCATCGGAC HWI-EAS88_1_1_1_8...
[3] 36 GCGGTGGTCTATAGTGTTATTAATATCAATTTGGGT HWI-EAS88_1_1_1_9...
[4] 36 GTTACCATGATGTTATTTCTTCATTTGGAGGTAAAA HWI-EAS88_1_1_1_8...
[5] 36 GTATGTTTCTCCTGCTTATCACCTTCTTGAAGGCTT HWI-EAS88_1_1_1_9...
... ... ...
[252] 36 GTTTAGATATGAGTCACATTTTGTTCATGGTAGAGT HWI-EAS88_1_1_1_6...
[253] 36 GTTTTACAGACACCTAAAGCTACATCGTCAACGTTA HWI-EAS88_1_1_1_1...
[254] 36 GATGAACTAAGTCAACCTCAGCACTAACCTTGCGAG HWI-EAS88_1_1_1_3...
[255] 36 GTTTGGTTCGCTTTGAGTCTTCTTCGGTTCCGACTA HWI-EAS88_1_1_1_8...
[256] 36 GCAATCTGCCGACCACTCGCGATTCAATCATGACTT HWI-EAS88_1_1_1_8...
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.