Description Usage Arguments Details Value Author(s) References

View source: R/qualityControl.R

This function evaluates the sequence complexity using the Shannon-Wiener Algorithm.

1 2 | ```
complexity.entropy(object, xlab="Complexity score (0=low, 100=high)", ylab="Number of sequences",
xlim=c(0, 100), col="firebrick1", breaks=100, ...)
``` |

`object` |
An object of class DNAStringSet, ShortRead or SFFContainer. |

`xlab` |
The X axis label. |

`ylab` |
The Y axis label. |

`xlim` |
The limits of the X axis. |

`col` |
The plotting color. |

`breaks` |
The number of breaks in the histogram (see ‘hist’). |

`...` |
Arguments to be passed to methods, such as graphical parameters (see ‘par’). |

The entropy approach evaluates the entropy of trinucleotides in a sequence. The entropy values are scaled from 0 to 100 and lower entropy values imply lower complexity. A sequence of homopolymer repeats (e.g. TTTTTTTTTT) has an entropy value of 0, of dinucleotide repeats (e.g. TATATATATA) has an entropy value around 16, and of trinucleotide repeats (e.g. TAGTAGTAG) has an entropy value around 26. Scores below 70 can be considered low-complexity.

A numeric vector containing the complexity score for each sequence.

Christian Ruckert

Schmieder R. (2011) Quality control and preprocessing of metagenomic datasets.
*Bioinformatics*, 2011 Mar 15;27(6):863-4.

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.