View source: R/CLUSTERING-tadpole.R
TADPole | R Documentation |
Time-series Anytime Density Peaks Clustering as proposed by Begum et al. (2015).
TADPole( data, k = 2L, dc, window.size, error.check = TRUE, lb = "lbk", trace = FALSE ) tadpole( data, k = 2L, dc, window.size, error.check = TRUE, lb = "lbk", trace = FALSE )
data |
A matrix or data frame where each row is a time series, or a list where each element is a time series. Multivariate series are not supported. |
k |
The number of desired clusters. Can be a vector with several values. |
dc |
The cutoff distance(s). Can be a vector with several values. |
window.size |
Window size constraint for DTW (Sakoe-Chiba). See details. |
error.check |
Logical indicating whether the function should try to detect inconsistencies and give more informative errors messages. Also used internally to avoid repeating checks. |
lb |
Which lower bound to use, "lbk" for |
trace |
Logical flag. If |
This function can be called either directly or through tsclust()
.
TADPole clustering adopts a relatively new clustering framework and adapts it to time series clustering with DTW. See the cited article for the details of the algorithm.
Because of the way the algorithm works, it can be considered a kind of Partitioning Around
Medoids (PAM). This means that the cluster centroids are always elements of the data. However,
this algorithm is deterministic, depending on the value of dc
.
The algorithm first uses the DTW's upper and lower bounds (Euclidean and LB_Keogh respectively)
to find series with many close neighbors (in DTW space). Anything below the cutoff distance
(dc
) is considered a neighbor. Aided with this information, the algorithm then tries to prune
as many DTW calculations as possible in order to accelerate the clustering procedure. The series
that lie in dense areas (i.e. that have lots of neighbors) are taken as cluster centroids.
The algorithm relies on the DTW bounds, which are only defined for univariate time series of equal length.
Parallelization is supported in the following way:
For multiple dc
values, multi-processing with foreach::foreach()
.
The internal distance calculations use multi-threading with RcppParallel::RcppParallel.
The windowing constraint uses a centered window. The calculations expect a value in
window.size
that represents the distance between the point considered and one of the edges
of the window. Therefore, if, for example, window.size = 10
, the warping for an
observation x_i considers the points between x_{i-10} and x_{i+10}, resulting
in 10(2) + 1 = 21
observations falling within the window.
A list with:
cl
: Cluster indices.
centroids
: Indices of the centroids.
distCalcPercentage
: Percentage of distance calculations that were actually performed.
For multiple k
/dc
values, a list of lists is returned, each internal list having the
aforementioned elements.
Please note that running tasks in parallel does not guarantee faster computations. The overhead introduced is sometimes too large, and it's better to run tasks sequentially.
This function uses the RcppParallel
package
for parallelization. It uses all available threads by default (see
RcppParallel::defaultNumThreads()
), but this can
be changed by the user with
RcppParallel::setThreadOptions()
.
An exception to the above is when this function is called within a
foreach
parallel loop made by dtwclust. If the parallel
workers do not have the number of threads explicitly specified, this function will default to 1
thread per worker. See the parallelization vignette for more information
(browseVignettes("dtwclust")
).
Begum N, Ulanova L, Wang J and Keogh E (2015). “Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy.” In Conference on Knowledge Discovery and Data Mining, series KDD '15. ISBN 978-1-4503-3664-2/15/08, doi: 10.1145/2783258.2783286.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.