Summarize low-complexity sequences
dustyScore identifies low-complexity sequences, in a manner
inspired by the
dust implementation in
Additional arguments, not currently used.
The following methods are defined:
signature(x = "DNAStringSet"): operating on an object derived from class
signature(x = "ShortRead"): operating on the
sreadof an object derived from class
The dust-like calculations used here are as implemented at https://stat.ethz.ch/pipermail/bioc-sig-sequencing/2009-February/000170.html. Scores range from 0 (all triplets unique) to the square of the width of the longest sequence (poly-A, -C, -G, or -T).
batchSize argument can be used to reduce the memory
requirements of the algorithm by processing the
x argument in
batches of the specified size. Smaller batch sizes use less memory,
but are computationally less efficient.
A vector of numeric scores, with length equal to the length of
Herve Pages (code); Martin Morgan
Morgulis, Getz, Schaffer and Agarwala, 2006. WindowMasker: window-based masker for sequenced genomes, Bioinformatics 22: 134-141.
The WindowMasker supplement defining
1 2 3
Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.