compDist: Compression-/Complexity-based Dissimilarity

Description Usage Arguments Value References See Also

Description

Dissimilarity based on the length of the compressed single as well as concatenated time series as described by Li et al. (2001). Time series are represented with SAX first and then zipped, both according to Keogh et al. (2007). As an improvement, the dissimilarity is scaled to the interval [0,1] (before: (0.5,1]) and made symmetric. Multi-variate time series are handled by attribute concatenation.

Usage

1
compDist(x, y, symbolCount = 8, symbolLimits = NULL)

Arguments

x

1st numeric vector/matrix (uni- or multi-variate time series).

y

2nd numeric vector/matrix (uni- or multi-variate time series).

symbolCount

Number of SAX symbols. Boundaries for the intervals will be determined based on the standard normal distribution. As an alternative, you can supply the boundaries directly.

symbolLimits

Interval boundaries which will be used to convert the time series to a SAX representation. Should be a monotonically increasing vector starting with -Inf and ending with +Inf. The parameter symbolCount is ignored if you supply a value here.

Value

The dissimilarity as numeric from the range [0,1].

References

Keogh, E., Lonardi, S., Ratanamahatana, C. A., Wei, L., Lee, S.-H. & Handley, J. (2007). Compression-based data mining of sequential data. Data Mining and Knowledge Discovery, 14(1), 99–129.

Li, M., Badger, J. H., Chen, X., Kwong, S., Kearney, P. & Zhang, H. (2001). An information-based sequence distance and its application to whole mitochondrial genome phylogeny. Bioinformatics, 17(2), 149–154.

See Also

Other compression-based distances: compDistTSList


Jakob-Bach/FastTSDistances documentation built on May 13, 2019, 1:15 p.m.