Plot KL Divergence Components for a Subset of kmers to Inspect for Contamination
Description
kmerKLPlot
calls calcKL
, which calculates the
KullbackLeibler divergence between the kmer distribution at each
position compared to the kmer distribution across all
positions. kmerKLPlot
then plots each kmer's contribution to
the total KL divergence by stack bars, for a subset of the
kmers. Since there are 4^k possible kmers for some value kmers,
plotting each often dilutes the interpretation; however one can
increase n.kmers
to a number greater than the possible number
of kmers to force kmerKLPlot
to plot the entire KL divergence
and all terms (which are kmers) in the sum.
If a x
is a list
, the KL kmer plots are faceted by
sample; this allows comparison to a FASTA file of random reads.
Again, please note that this is not the total KL divergence,
but rather the KL divergence calculated on a subset of the sample
space (those of the top n.kmers
kmers selected).
Usage
1  kmerKLPlot(x, n.kmers=20)

Arguments
x 
an S4 object a class that inherits from 
n.kmers 
a integer value indicating the size of top kmers to include. 
Methods
signature(x = "SequenceSummary")

kmerKLPlot
will plot the KL divergence for a subset of kmers for a single object that inherits fromSequenceSummary
. signature(x = "list")

kmerKLPlot
will plot the KL divergence for a susbet of kmers for each of the objects that inherit fromSequenceSummary
in the list and display them in a series of panels.
Note
The KL divergence calculation in calcKL
uses base 2 in the
log; the units are in bits.
Also, note that ggplot2
warns that "Stacking is not well defined when ymin
!= 0". This occurs when some kmers are less frequent in the positional
distribution than the distribution across all positions, and the term of
the KL sum is negative (producing a bar below zero). This does not
appear to affect the plot much. In examples below, warnings are
suppressed, but the given this is a valid concern from ggplot2
,
warnings are not suppressed in the function itself.
Author(s)
Vince Buffalo <vsbuffalo@ucdavis.edu>
See Also
getKmer
, calcKL
,
kmerEntropyPlot
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19  ## Load a somewhat contaminated FASTQ file
s.fastq < readSeqFile(system.file('extdata', 'test.fastq',
package='qrqc'), hash.prop=1)
## Load a really contaminated FASTQ file
s.contam.fastq < readSeqFile(system.file('extdata',
'testcontam.fastq', package='qrqc'), hash.prop=1)
## Load a random (equal base frequency) FASTA file
s.random.fasta < readSeqFile(system.file('extdata',
'random.fasta', package='qrqc'), type="fasta", hash.prop=1)
## Make KL divergence plot  shows slight 5'end bias. Note units
## (bits)
suppressWarnings(kmerKLPlot(s.fastq))
## Plot multiple KL divergence plots
suppressWarnings(kmerKLPlot(list("highly contaminated"=s.contam.fastq, "less
contaminated"=s.fastq, "random"=s.random.fasta)))
