complexity.entropy: Sequence Complexity Using The Shannon-Wiener Algorithm

Description Usage Arguments Details Value Author(s) References

View source: R/qualityControl.R

Description

This function evaluates the sequence complexity using the Shannon-Wiener Algorithm.

Usage

1
2
  complexity.entropy(object, xlab="Complexity score (0=low, 100=high)", ylab="Number of sequences", 
    xlim=c(0, 100), col="firebrick1", breaks=100, ...)

Arguments

object

An object of class DNAStringSet, ShortRead or SFFContainer.

xlab

The X axis label.

ylab

The Y axis label.

xlim

The limits of the X axis.

col

The plotting color.

breaks

The number of breaks in the histogram (see ‘hist’).

...

Arguments to be passed to methods, such as graphical parameters (see ‘par’).

Details

The entropy approach evaluates the entropy of trinucleotides in a sequence. The entropy values are scaled from 0 to 100 and lower entropy values imply lower complexity. A sequence of homopolymer repeats (e.g. TTTTTTTTTT) has an entropy value of 0, of dinucleotide repeats (e.g. TATATATATA) has an entropy value around 16, and of trinucleotide repeats (e.g. TAGTAGTAG) has an entropy value around 26. Scores below 70 can be considered low-complexity.

Value

A numeric vector containing the complexity score for each sequence.

Author(s)

Christian Ruckert

References

Schmieder R. (2011) Quality control and preprocessing of metagenomic datasets. Bioinformatics, 2011 Mar 15;27(6):863-4.


R453Plus1Toolbox documentation built on Nov. 1, 2018, 2:27 a.m.