DBA: DTW Barycenter Averaging

Description Usage Arguments Details Value Parallel Computing Multivariate series Note References Examples

View source: R/CENTROIDS-dba.R

Description

A global averaging method for time series under DTW (Petitjean, Ketterlin and Gancarski 2011).

Usage

1
2
3
4
5
6
7
DBA(X, centroid = NULL, ..., window.size = NULL, norm = "L1",
  max.iter = 20L, delta = 0.001, error.check = TRUE, trace = FALSE,
  mv.ver = "by-variable")

dba(X, centroid = NULL, ..., window.size = NULL, norm = "L1",
  max.iter = 20L, delta = 0.001, error.check = TRUE, trace = FALSE,
  mv.ver = "by-variable")

Arguments

X

A matrix or data frame where each row is a time series, or a list where each element is a time series. Multivariate series should be provided as a list of matrices where time spans the rows and the variables span the columns of each matrix.

centroid

Optionally, a time series to use as reference. Defaults to a random series of X if NULL. For multivariate series, this should be a matrix with the same characteristics as the matrices in X.

...

Further arguments for dtw_basic(). However, the following are already pre- specified: window.size, norm (passed along), and backtrack.

window.size

Window constraint for the DTW calculations. NULL means no constraint. A slanted band is used.

norm

Norm for the local cost matrix of DTW. Either "L1" for Manhattan distance or "L2" for Euclidean distance.

max.iter

Maximum number of iterations allowed.

delta

At iteration i, if all(abs(centroid_{i} - centroid_{i-1}) < delta), convergence is assumed.

error.check

Logical indicating whether the function should try to detect inconsistencies and give more informative errors messages. Also used internally to avoid repeating checks.

trace

If TRUE, the current iteration is printed to output.

mv.ver

Multivariate version to use. See below.

Details

This function tries to find the optimum average series between a group of time series in DTW space. Refer to the cited article for specific details on the algorithm.

If a given series reference is provided in centroid, the algorithm should always converge to the same result provided the elements of X keep the same values, although their order may change.

The windowing constraint uses a centered window. The calculations expect a value in window.size that represents the distance between the point considered and one of the edges of the window. Therefore, if, for example, window.size = 10, the warping for an observation x_i considers the points between x_{i-10} and x_{i+10}, resulting in 10(2) + 1 = 21 observations falling within the window.

Value

The average time series.

Parallel Computing

Please note that running tasks in parallel does not guarantee faster computations. The overhead introduced is sometimes too large, and it's better to run tasks sequentially.

This function uses the RcppParallel package for parallelization. It uses all available threads by default (see RcppParallel::defaultNumThreads()), but this can be changed by the user with RcppParallel::setThreadOptions().

An exception to the above is when this function is called within a foreach parallel loop made by dtwclust. If the parallel workers do not have the number of threads explicitly specified, this function will default to 1 thread per worker. See the parallelization vignette for more information (browseVignettes("dtwclust")).

This function appears to be very sensitive to numerical inaccuracies if multi-threading is used in a 32 bit installation. In such systems, consider limiting calculations to 1 thread.

Multivariate series

There are currently 2 versions of DBA implemented for multivariate series:

Note

The indices of the DTW alignment are obtained by calling dtw_basic() with backtrack = TRUE.

References

Petitjean F, Ketterlin A and Gancarski P (2011). “A global averaging method for dynamic time warping, with applications to clustering.” Pattern Recognition, 44(3), pp. 678 - 693. ISSN 0031-3203, http://dx.doi.org/10.1016/j.patcog.2010.09.013, http://www.sciencedirect.com/science/article/pii/S003132031000453X.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
# Sample data
data(uciCT)

# Obtain an average for the first 5 time series
dtw_avg <- DBA(CharTraj[1:5], CharTraj[[1]], trace = TRUE)

# Plot
matplot(do.call(cbind, CharTraj[1:5]), type = "l")
points(dtw_avg)

# Change the provided order
dtw_avg2 <- DBA(CharTraj[5:1], CharTraj[[1]], trace = TRUE)

# Same result?
all.equal(dtw_avg, dtw_avg2)

## Not run: 
# ====================================================================================
# Multivariate versions
# ====================================================================================

# sample centroid reference
cent <- CharTrajMV[[3L]]
# sample series
x <- CharTrajMV[[1L]]
# sample set of series
X <- CharTrajMV[1L:5L]

# the by-series version does something like this for each series and the centroid
alignment <- dtw_basic(x, cent, backtrack = TRUE)
# alignment$index1 and alginment$index2 indicate how to map x to cent (row-wise)

# the by-variable version treats each variable separately
alignment1 <- dtw_basic(x[,1L], cent[,1L], backtrack = TRUE)
alignment2 <- dtw_basic(x[,2L], cent[,2L], backtrack = TRUE)
alignment3 <- dtw_basic(x[,3L], cent[,3L], backtrack = TRUE)

# effectively doing:
X1 <- lapply(X, function(x) { x[,1L] })
X2 <- lapply(X, function(x) { x[,2L] })
X3 <- lapply(X, function(x) { x[,3L] })

dba1 <- dba(X1, cent[,1L])
dba2 <- dba(X2, cent[,2L])
dba3 <- dba(X3, cent[,3L])

new_cent <- cbind(dba1, dba2, dba3)

# sanity check
newer_cent <- dba(X, cent, mv.ver = "by-variable")
all.equal(newer_cent, new_cent, check.attributes = FALSE) # ignore names


## End(Not run)

Example output

Loading required package: proxy

Attaching package: 'proxy'

The following objects are masked from 'package:stats':

    as.dist, dist

The following object is masked from 'package:base':

    as.matrix

Loading required package: clue
Loading required package: dtw
Loaded dtw v1.18-1. See ?dtw for help, citation("dtw") for use in publication.

Loading required package: ggplot2

dtwclust:
Setting random number generator to L'Ecuyer-CMRG (see RNGkind()).
To read the included vignettes type: browseVignettes("dtwclust").
Please see news(package = "dtwclust") for important information.
	DBA Iteration: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
		 11, 12, 13, 14 - Converged!
	DBA Iteration: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
		 11, 12, 13, 14 - Converged!
[1] TRUE
[1] TRUE

dtwclust documentation built on July 21, 2018, 5:01 p.m.