# Sequence Complexity Using The DUST Algorithm

### Description

This function evaluates the sequence complexity using the DUST algorithm.

### Usage

1 2 | ```
complexity.dust(object, xlab="Complexity score (0=high, 100=low)", ylab="Number of sequences",
xlim=c(0, 100), col="firebrick1", breaks=100, ...)
``` |

### Arguments

`object` |
An object of class DNAStringSet, ShortRead or SFFContainer. |

`xlab` |
The X axis label. |

`ylab` |
The Y axis label. |

`xlim` |
The limits of the X axis. |

`col` |
The plotting color. |

`breaks` |
The number of breaks in the histogram (see ‘hist’). |

`...` |
Arguments to be passed to methods, such as graphical parameters (see ‘par’). |

### Details

The complexity score is based on how often different trinucleotides occur and is scaled between 0 and 100. A sequence of homopolymer repeats (e.g. TTTTTTTTTT) has a score of 100, of dinucleotide repeats (e.g. TATATATATA) has a score around 49, and of trinucleotide repeats (e.g. TAGTAGTAG) has a score around 32. Scores above seven can be considered low-complexity.

### Value

A numeric vector containing the complexity score for each sequence.

### Author(s)

Christian Ruckert

### References

Schmieder R. (2011) Quality control and preprocessing of metagenomic datasets.
*Bioinformatics*, 2011 Mar 15;27(6):863-4.