Distance based on coefficient-normalized cross-correlation as proposed by Paparrizos and Gravano (2015) for the k-Shape clustering algorithm.
SBD(x, y, znorm = FALSE, error.check = TRUE, return.shifted = TRUE) sbd(x, y, znorm = FALSE, error.check = TRUE, return.shifted = TRUE)
Univariate time series.
Logical. Should each series be z-normalized before calculating the distance?
Logical indicating whether the function should try to detect inconsistencies and give more informative errors messages. Also used internally to avoid repeating checks.
Logical. Should the shifted version of
This distance works best if the series are z-normalized. If not, at least they should have appropriate amplitudes, since the values of the signals do affect the outcome.
y do not have the same length, it would be best if the longer sequence is
y, because it will be shifted to match
x. After matching, the series may have to
be truncated or extended and padded with zeros if needed.
The output values lie between 0 and 2, with 0 indicating perfect similarity.
return.shifted = FALSE, the numeric distance value, otherwise a list with:
dist: The shape-based distance between
yshift: A shifted version of
y so that it optimally matches
x (based on
The version registered with
dist is custom (
loop = FALSE in
pr_DB). The custom function handles multi-threaded parallelization
RcppParallel). It uses all
available threads by default (see
RcppParallel::defaultNumThreads()), but this can
be changed by the user with
An exception to the above is when it is called within a
parallel loop made by dtwclust. If the parallel workers do not have the number of
threads explicitly specified, this function will default to 1 thread per worker. See the
parallelization vignette for more information (
It also includes symmetric optimizations to calculate only half a distance matrix when
appropriate—only one list of series should be provided in
x. If you want to avoid this
dist by giving the same list of series in both
In some situations, e.g. for relatively small distance matrices, the overhead introduced by the logic that computes only half the distance matrix can be bigger than just calculating the whole matrix.
If you wish to calculate the distance between several time series, it would be better to use the
version registered with the
proxy package, since it includes some small optimizations. See the
This distance is calculated with help of the Fast Fourier Transform, so it can be sensitive to numerical precision. Thus, this function (and the functions that depend on it) might return different values in 32 bit installations compared to 64 bit ones.
Paparrizos J and Gravano L (2015). “k-Shape: Efficient and Accurate Clustering of Time Series.” In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, series SIGMOD '15, pp. 1855-1870. ISBN 978-1-4503-2758-9, doi: 10.1145/2723372.2737793.
# load data data(uciCT) # distance between series of different lengths sbd <- SBD(CharTraj[], CharTraj[], znorm = TRUE)$dist # cross-distance matrix for series subset (notice the two-list input) sbD <- proxy::dist(CharTraj[1:10], CharTraj[1:10], method = "SBD", znorm = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.