View source: R/CENTROIDS-dba.R

DBA | R Documentation |

A global averaging method for time series under DTW (Petitjean, Ketterlin and Gancarski 2011).

DBA( X, centroid = NULL, ..., window.size = NULL, norm = "L1", max.iter = 20L, delta = 0.001, error.check = TRUE, trace = FALSE, mv.ver = "by-variable" ) dba( X, centroid = NULL, ..., window.size = NULL, norm = "L1", max.iter = 20L, delta = 0.001, error.check = TRUE, trace = FALSE, mv.ver = "by-variable" )

`X` |
A matrix or data frame where each row is a time series, or a list where each element is a time series. Multivariate series should be provided as a list of matrices where time spans the rows and the variables span the columns of each matrix. |

`centroid` |
Optionally, a time series to use as reference. Defaults to a random series of |

`...` |
Further arguments for |

`window.size` |
Window constraint for the DTW calculations. |

`norm` |
Norm for the local cost matrix of DTW. Either "L1" for Manhattan distance or "L2" for Euclidean distance. |

`max.iter` |
Maximum number of iterations allowed. |

`delta` |
At iteration |

`error.check` |
Logical indicating whether the function should try to detect inconsistencies and give more informative errors messages. Also used internally to avoid repeating checks. |

`trace` |
If |

`mv.ver` |
Multivariate version to use. See below. |

This function tries to find the optimum average series between a group of time series in DTW space. Refer to the cited article for specific details on the algorithm.

If a given series reference is provided in `centroid`

, the algorithm should always converge to
the same result provided the elements of `X`

keep the same values, although their order may
change.

The windowing constraint uses a centered window. The calculations expect a value in
`window.size`

that represents the distance between the point considered and one of the edges
of the window. Therefore, if, for example, `window.size = 10`

, the warping for an
observation *x_i* considers the points between *x_{i-10}* and *x_{i+10}*, resulting
in `10(2) + 1 = 21`

observations falling within the window.

The average time series.

Please note that running tasks in parallel does **not** guarantee faster computations. The
overhead introduced is sometimes too large, and it's better to run tasks sequentially.

This function uses the `RcppParallel`

package
for parallelization. It uses all available threads by default (see
`RcppParallel::defaultNumThreads()`

), but this can
be changed by the user with
`RcppParallel::setThreadOptions()`

.

An exception to the above is when this function is called within a
`foreach`

parallel loop **made by dtwclust**. If the parallel
workers do not have the number of threads explicitly specified, this function will default to 1
thread per worker. See the parallelization vignette for more information
(`browseVignettes("dtwclust")`

).

This function appears to be very sensitive to numerical inaccuracies if multi-threading is used
in a **32 bit** installation. In such systems, consider limiting calculations to 1 thread.

There are currently 2 versions of DBA implemented for multivariate series (see examples):

If

`mv.ver = "by-variable"`

, then each variable of each series in`X`

and`centroid`

are extracted, and the univariate version of the algorithm is applied to each set of variables, binding the results by column. Therefore, the DTW backtracking is different for each variable.If

`mv.ver = "by-series"`

, then all variables are considered at the same time, so the DTW backtracking is computed based on each multivariate series as a whole. This version was implemented in version 4.0.0 of dtwclust, and it is faster, but not necessarily more correct.

The indices of the DTW alignment are obtained by calling `dtw_basic()`

with `backtrack = TRUE`

.

Petitjean F, Ketterlin A and Gancarski P (2011). “A global averaging method for dynamic time
warping, with applications to clustering.” *Pattern Recognition*, **44**(3), pp. 678 - 693. ISSN
0031-3203, doi: 10.1016/j.patcog.2010.09.013,
https://www.sciencedirect.com/science/article/pii/S003132031000453X.

# Sample data data(uciCT) # Obtain an average for the first 5 time series dtw_avg <- DBA(CharTraj[1:5], CharTraj[[1]], trace = TRUE) # Plot matplot(do.call(cbind, CharTraj[1:5]), type = "l") points(dtw_avg) # Change the provided order dtw_avg2 <- DBA(CharTraj[5:1], CharTraj[[1]], trace = TRUE) # Same result? all.equal(dtw_avg, dtw_avg2) ## Not run: # ==================================================================================== # Multivariate versions # ==================================================================================== # sample centroid reference cent <- CharTrajMV[[3L]] # sample series x <- CharTrajMV[[1L]] # sample set of series X <- CharTrajMV[1L:5L] # the by-series version does something like this for each series and the centroid alignment <- dtw_basic(x, cent, backtrack = TRUE) # alignment$index1 and alginment$index2 indicate how to map x to cent (row-wise) # the by-variable version treats each variable separately alignment1 <- dtw_basic(x[,1L], cent[,1L], backtrack = TRUE) alignment2 <- dtw_basic(x[,2L], cent[,2L], backtrack = TRUE) alignment3 <- dtw_basic(x[,3L], cent[,3L], backtrack = TRUE) # effectively doing: X1 <- lapply(X, function(x) { x[,1L] }) X2 <- lapply(X, function(x) { x[,2L] }) X3 <- lapply(X, function(x) { x[,3L] }) dba1 <- dba(X1, cent[,1L]) dba2 <- dba(X2, cent[,2L]) dba3 <- dba(X3, cent[,3L]) new_cent <- cbind(dba1, dba2, dba3) # sanity check newer_cent <- dba(X, cent, mv.ver = "by-variable") all.equal(newer_cent, new_cent, check.attributes = FALSE) # ignore names ## End(Not run)

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.