Sequence Complexity Using The DUST Algorithm

Description

This function evaluates the sequence complexity using the DUST algorithm.

Usage

1
2
  complexity.dust(object, xlab="Complexity score (0=high, 100=low)", ylab="Number of sequences", 
    xlim=c(0, 100), col="firebrick1", breaks=100, ...)

Arguments

object

An object of class DNAStringSet, ShortRead or SFFContainer.

xlab

The X axis label.

ylab

The Y axis label.

xlim

The limits of the X axis.

col

The plotting color.

breaks

The number of breaks in the histogram (see ‘hist’).

...

Arguments to be passed to methods, such as graphical parameters (see ‘par’).

Details

The complexity score is based on how often different trinucleotides occur and is scaled between 0 and 100. A sequence of homopolymer repeats (e.g. TTTTTTTTTT) has a score of 100, of dinucleotide repeats (e.g. TATATATATA) has a score around 49, and of trinucleotide repeats (e.g. TAGTAGTAG) has a score around 32. Scores above seven can be considered low-complexity.

Value

A numeric vector containing the complexity score for each sequence.

Author(s)

Christian Ruckert

References

Schmieder R. (2011) Quality control and preprocessing of metagenomic datasets. Bioinformatics, 2011 Mar 15;27(6):863-4.

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.