TADPole: TADPole clustering

View source: R/CLUSTERING-tadpole.R

TADPoleR Documentation

TADPole clustering

Description

Time-series Anytime Density Peaks Clustering as proposed by Begum et al. (2015).

Usage

TADPole(
  data,
  k = 2L,
  dc,
  window.size,
  error.check = TRUE,
  lb = "lbk",
  trace = FALSE
)

tadpole(
  data,
  k = 2L,
  dc,
  window.size,
  error.check = TRUE,
  lb = "lbk",
  trace = FALSE
)

Arguments

data

A matrix or data frame where each row is a time series, or a list where each element is a time series. Multivariate series are not supported.

k

The number of desired clusters. Can be a vector with several values.

dc

The cutoff distance(s). Can be a vector with several values.

window.size

Window size constraint for DTW (Sakoe-Chiba). See details.

error.check

Logical indicating whether the function should try to detect inconsistencies and give more informative errors messages. Also used internally to avoid repeating checks.

lb

Which lower bound to use, "lbk" for lb_keogh() or "lbi" for lb_improved().

trace

Logical flag. If TRUE, more output regarding the progress is printed to screen.

Details

This function can be called either directly or through tsclust().

TADPole clustering adopts a relatively new clustering framework and adapts it to time series clustering with DTW. See the cited article for the details of the algorithm.

Because of the way the algorithm works, it can be considered a kind of Partitioning Around Medoids (PAM). This means that the cluster centroids are always elements of the data. However, this algorithm is deterministic, depending on the value of dc.

The algorithm first uses the DTW's upper and lower bounds (Euclidean and LB_Keogh respectively) to find series with many close neighbors (in DTW space). Anything below the cutoff distance (dc) is considered a neighbor. Aided with this information, the algorithm then tries to prune as many DTW calculations as possible in order to accelerate the clustering procedure. The series that lie in dense areas (i.e. that have lots of neighbors) are taken as cluster centroids.

The algorithm relies on the DTW bounds, which are only defined for univariate time series of equal length.

Parallelization is supported in the following way:

  • For multiple dc values, multi-processing with foreach::foreach().

  • The internal distance calculations use multi-threading with RcppParallel::RcppParallel.

The windowing constraint uses a centered window. The calculations expect a value in window.size that represents the distance between the point considered and one of the edges of the window. Therefore, if, for example, window.size = 10, the warping for an observation x_i considers the points between x_{i-10} and x_{i+10}, resulting in 10(2) + 1 = 21 observations falling within the window.

Value

A list with:

  • cl: Cluster indices.

  • centroids: Indices of the centroids.

  • distCalcPercentage: Percentage of distance calculations that were actually performed.

For multiple k/dc values, a list of lists is returned, each internal list having the aforementioned elements.

Parallel Computing

Please note that running tasks in parallel does not guarantee faster computations. The overhead introduced is sometimes too large, and it's better to run tasks sequentially.

This function uses the RcppParallel package for parallelization. It uses all available threads by default (see RcppParallel::defaultNumThreads()), but this can be changed by the user with RcppParallel::setThreadOptions().

An exception to the above is when it is called within a foreach parallel loop made by dtwclust. If the parallel workers do not have the number of threads explicitly specified, this function will default to 1 thread per worker. See the parallelization vignette for more information - browseVignettes("dtwclust")

References

Begum N, Ulanova L, Wang J and Keogh E (2015). “Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy.” In Conference on Knowledge Discovery and Data Mining, series KDD '15. ISBN 978-1-4503-3664-2/15/08, \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1145/2783258.2783286")}.


dtwclust documentation built on Sept. 11, 2024, 9:07 p.m.