Description Usage Arguments Value References
Method which contains the formula used in compDist
and
compDistTSList
. Considering the complexity to be a kind of
entropy measure, our formula is similar to the normalized Variation of
Information described by Meila (2003) and Wu, Xiong and Chen (2009), setting
the joint entropy in relation to the single entropies. This results in a value
from the interval [0,1], compared to (0.5,1] in the formula of Keogh et al.
(2007). Our measure is symmetric.
1 | calcCompDist(xLength, yLength, xyLength, yxLength)
|
xLength |
Length of the first string/time series after compression. |
yLength |
Length of the second string/time series after compression. |
xyLength |
Length of the concatenation of first and second string/time series after compression. |
yxLength |
Length of the concatenation of second and first string/time series after compression. |
The dissimilarity as numeric from the range [0,1].
Keogh, E., Lonardi, S., Ratanamahatana, C. A., Wei, L., Lee, S.-H. & Handley, J. (2007). Compression-based data mining of sequential data. Data Mining and Knowledge Discovery, 14(1), 99–129.
Li, M., Badger, J. H., Chen, X., Kwong, S., Kearney, P. & Zhang, H. (2001). An information-based sequence distance and its application to whole mitochondrial genome phylogeny. Bioinformatics, 17(2), 149–154.
Meila, M. (2003). Comparing clusterings by the variation of information. In B. Schölkopf & M. K. Warmuth (Eds.), Learning theory and kernel machines: 16th annual conference on learning theory and 7th kernel workshop, colt/kernel 2003, washington, dc, usa, august 24-27, 2003. proceedings (pp. 173-187). Springer Berlin Heidelberg.
Wu, J., Xiong, H. & Chen, J. (2009). Adapting the right measures for k-means clustering. In Proceedings of the 15th acm sigkdd international conference on knowledge discovery and data mining (pp. 877-886). ACM.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.