GAK: Fast global alignment kernels

View source: R/DISTANCES-gak.R

GAKR Documentation

Fast global alignment kernels

Description

Distance based on (triangular) global alignment kernels.

Usage

GAK(
  x,
  y,
  ...,
  sigma = NULL,
  window.size = NULL,
  normalize = TRUE,
  error.check = TRUE
)

gak(
  x,
  y,
  ...,
  sigma = NULL,
  window.size = NULL,
  normalize = TRUE,
  error.check = TRUE
)

Arguments

x, y

Time series. A multivariate series should have time spanning the rows and variables spanning the columns.

...

Currently ignored.

sigma

Parameter for the Gaussian kernel's width. See details for the interpretation of NULL.

window.size

Parameterization of the constraining band (T in Cuturi (2011)). See details.

normalize

Normalize the result by considering diagonal terms.

error.check

Logical indicating whether the function should try to detect inconsistencies and give more informative errors messages. Also used internally to avoid repeating checks.

Details

This function uses the Triangular Global Alignment Kernel (TGAK) described in Cuturi (2011). It supports series of different length and multivariate series, so long as the ratio of the series' lengths doesn't differ by more than 2 (or less than 0.5).

The window.size parameter is similar to the one used in DTW, so NULL signifies no constraint, and its value should be greater than 1 if used with series of different length.

The Gaussian kernel is parameterized by sigma. Providing NULL means that the value will be estimated by using the strategy mentioned in Cuturi (2011) with a constant of 1. This estimation is subject to randomness, so consider estimating the value once and re-using it (the estimate is returned as an attribute of the result). See the examples.

For more information, refer to the package vignette and the referenced article.

Value

The logarithm of the GAK if normalize = FALSE, otherwise 1 minus the normalized GAK. The value of sigma is assigned as an attribute of the result.

Proxy version

The version registered with proxy::dist() is custom (loop = FALSE in proxy::pr_DB). The custom function handles multi-threaded parallelization directly with RcppParallel. It uses all available threads by default (see RcppParallel::defaultNumThreads()), but this can be changed by the user with RcppParallel::setThreadOptions().

An exception to the above is when it is called within a foreach parallel loop made by dtwclust. If the parallel workers do not have the number of threads explicitly specified, this function will default to 1 thread per worker. See the parallelization vignette for more information - browseVignettes("dtwclust")

It also includes symmetric optimizations to calculate only half a distance matrix when appropriate—only one list of series should be provided in x. Starting with version 6.0.0, this optimization means that the function returns an array with the lower triangular values of the distance matrix, similar to what stats::dist() does; see DistmatLowerTriangular for a helper to access elements as it if were a normal matrix. If you want to avoid this optimization, call proxy::dist by giving the same list of series in both x and y.

Note

The estimation of sigma does not depend on window.size.

If normalize is set to FALSE, the returned value is not a distance, rather a similarity. The proxy::dist() version is thus always normalized. Use proxy::simil() with method set to "uGAK" if you want the unnormalized similarities.

A constrained unnormalized calculation (i.e. with window.size > 0 and normalize = FALSE) will return negative infinity if ⁠abs(NROW(x)⁠ - ⁠NROW(y))⁠ > window.size. Since the function won't perform calculations in that case, it might be faster, but if this behavior is not desired, consider reinterpolating the time series (see reinterpolate()) or increasing the window size.

References

Cuturi, M. (2011). Fast global alignment kernels. In Proceedings of the 28th international conference on machine learning (ICML-11) (pp. 929-936).

Examples


## Not run: 
data(uciCT)

set.seed(832)
GAKd <- proxy::dist(zscore(CharTraj), method = "gak",
                    pairwise = TRUE, window.size = 18L)

# Obtained estimate of sigma
sigma <- attr(GAKd, "sigma")

# Use value for clustering
tsclust(CharTraj, k = 20L,
        distance = "gak", centroid = "shape",
        trace = TRUE,
        args = tsclust_args(dist = list(sigma = sigma,
                                        window.size = 18L)))

## End(Not run)

# Unnormalized similarities
proxy::simil(CharTraj[1L:5L], method = "ugak")


dtwclust documentation built on Sept. 11, 2024, 9:07 p.m.