# SBD: Shape-based distance In dtwclust: Time Series Clustering Along with Optimizations for the Dynamic Time Warping Distance

 SBD R Documentation

## Shape-based distance

### Description

Distance based on coefficient-normalized cross-correlation as proposed by Paparrizos and Gravano (2015) for the k-Shape clustering algorithm.

### Usage

```SBD(x, y, znorm = FALSE, error.check = TRUE, return.shifted = TRUE)

sbd(x, y, znorm = FALSE, error.check = TRUE, return.shifted = TRUE)
```

### Arguments

 `x, y` Univariate time series. `znorm` Logical. Should each series be z-normalized before calculating the distance? `error.check` Logical indicating whether the function should try to detect inconsistencies and give more informative errors messages. Also used internally to avoid repeating checks. `return.shifted` Logical. Should the shifted version of `y` be returned? See details.

### Details

This distance works best if the series are z-normalized. If not, at least they should have appropriate amplitudes, since the values of the signals do affect the outcome.

If `x` and `y` do not have the same length, it would be best if the longer sequence is provided in `y`, because it will be shifted to match `x`. After matching, the series may have to be truncated or extended and padded with zeros if needed.

The output values lie between 0 and 2, with 0 indicating perfect similarity.

### Value

For `return.shifted = FALSE`, the numeric distance value, otherwise a list with:

• `dist`: The shape-based distance between `x` and `y`.

• `yshift`: A shifted version of `y` so that it optimally matches `x` (based on `NCCc()`).

### Proxy version

The version registered with `dist` is custom (`loop = FALSE` in `pr_DB`). The custom function handles multi-threaded parallelization directly (with `RcppParallel`). It uses all available threads by default (see `RcppParallel::defaultNumThreads()`), but this can be changed by the user with `RcppParallel::setThreadOptions()`.

An exception to the above is when it is called within a `foreach` parallel loop made by dtwclust. If the parallel workers do not have the number of threads explicitly specified, this function will default to 1 thread per worker. See the parallelization vignette for more information (`browseVignettes("dtwclust")`).

It also includes symmetric optimizations to calculate only half a distance matrix when appropriate—only one list of series should be provided in `x`. If you want to avoid this optimization, call `dist` by giving the same list of series in both `x` and `y`.

In some situations, e.g. for relatively small distance matrices, the overhead introduced by the logic that computes only half the distance matrix can be bigger than just calculating the whole matrix.

### Note

If you wish to calculate the distance between several time series, it would be better to use the version registered with the `proxy` package, since it includes some small optimizations. See the examples.

This distance is calculated with help of the Fast Fourier Transform, so it can be sensitive to numerical precision. Thus, this function (and the functions that depend on it) might return different values in 32 bit installations compared to 64 bit ones.

### References

Paparrizos J and Gravano L (2015). “k-Shape: Efficient and Accurate Clustering of Time Series.” In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, series SIGMOD '15, pp. 1855-1870. ISBN 978-1-4503-2758-9, doi: 10.1145/2723372.2737793.

`NCCc()`, `shape_extraction()`

```