Summarize low-complexity sequences

Share:

Description

dustyScore identifies low-complexity sequences, in a manner inspired by the dust implementation in BLAST.

Usage

1
dustyScore(x, batchSize=NA, ...)

Arguments

x

A DNAStringSet object, or object derived from ShortRead, containing a collection of reads to be summarized.

batchSize

NA or an integer(1) vector indicating the maximum number of reads to be processed at any one time.

...

Additional arguments, not currently used.

Details

The following methods are defined:

dustyScore

signature(x = "DNAStringSet"): operating on an object derived from class DNAStringSet.

dustyScore

signature(x = "ShortRead"): operating on the sread of an object derived from class ShortRead.

The dust-like calculations used here are as implemented at https://stat.ethz.ch/pipermail/bioc-sig-sequencing/2009-February/000170.html. Scores range from 0 (all triplets unique) to the square of the width of the longest sequence (poly-A, -C, -G, or -T).

The batchSize argument can be used to reduce the memory requirements of the algorithm by processing the x argument in batches of the specified size. Smaller batch sizes use less memory, but are computationally less efficient.

Value

A vector of numeric scores, with length equal to the length of x.

Author(s)

Herve Pages (code); Martin Morgan

References

Morgulis, Getz, Schaffer and Agarwala, 2006. WindowMasker: window-based masker for sequenced genomes, Bioinformatics 22: 134-141.

See Also

The WindowMasker supplement defining dust ftp://ftp.ncbi.nlm.nih.gov/pub/agarwala/windowmasker/windowmasker_suppl.pdf

Examples

1
2
3
sp <- SolexaPath(system.file('extdata', package='ShortRead'))
rfq <- readFastq(analysisPath(sp), pattern="s_1_sequence.txt")
range(dustyScore(rfq))

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.