Description Usage Arguments Details Value Note Author(s) References See Also Examples
Calculates pairwise distance matrix from DNA k-mer counts based on a modified Canberra distance. Before calculating canberra distances, read counts are normalized (in order to correct systematic effects on the distance) by scaling up read counts in each DNA k-mer count vector so that normalized read counts in each sample are nearly equal.
1 | cbDistMatrix(object,nReadNorm=max(nReads(object)))
|
object |
|
nReadNorm |
|
The distance between two DNA k-mer normalized count vectors is calculated by
df (X,Y) = ∑ cbc(x_i, y_i) / 4^k
where cb is given by
cbd(x,y)=|x-y|/(x+y).
Square matrix
. The number of rows equals the number of files
(=nFiles(object)
).
The static size of the retured k-mer array is 4^k.
Wolfgang Kaisers
Cock PJA, Fields CJ, Goto N, Heuer ML, Rice PM The sanger FASTQ file format for sequences with quality scores and the Solexa/Illumina FASTQ variants. Nucleic Acids Research 2010 Vol.38 No.6 1767-1771
hclust
1 2 3 4 5 | basedir<-system.file("extdata",package="seqTools")
basenames<-c("g4_l101_n100.fq.gz","g5_l101_n100.fq.gz")
filenames<-file.path(basedir,basenames)
fq<-fastqq(filenames,6,c("g4","g5"))
dm<-cbDistMatrix(fq)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.