tsclust-controls: Control parameters for clusterings with 'tsclust()'

tsclust-controlsR Documentation

Control parameters for clusterings with tsclust()

Description

Control parameters for fine-grained control.

Usage

partitional_control(
  pam.precompute = TRUE,
  iter.max = 100L,
  nrep = 1L,
  symmetric = FALSE,
  packages = character(0L),
  distmat = NULL,
  pam.sparse = FALSE,
  version = 2L
)

hierarchical_control(
  method = "average",
  symmetric = FALSE,
  packages = character(0L),
  distmat = NULL
)

fuzzy_control(
  fuzziness = 2,
  iter.max = 100L,
  delta = 0.001,
  packages = character(0L),
  symmetric = FALSE,
  version = 2L,
  distmat = NULL
)

tadpole_control(dc, window.size, lb = "lbk")

tsclust_args(preproc = list(), dist = list(), cent = list())

Arguments

pam.precompute

Logical flag. Precompute the whole distance matrix once and reuse it on each iteration if using PAM centroids. Otherwise calculate distances at every iteration. See details.

iter.max

Integer. Maximum number of allowed iterations for partitional/fuzzy clustering.

nrep

Integer. How many times to repeat clustering with different starting points (i.e., different random seeds).

symmetric

Logical flag. Is the distance function symmetric? In other words, is dist(x,y) == dist(y,x)? If TRUE, only half the distance matrix needs to be computed. Automatically detected and overridden for the distances included in dtwclust.

packages

Character vector with the names of any packages required for custom proxy functions. Relevant for parallel computation, although since the distance entries are re-registered in each parallel worker if needed, this is probably useless, but just in case.

distmat

If available, the cross-distance matrix can be provided here. Only relevant for partitional with PAM centroids, fuzzy with FCMdd centroids, or hierarchical clustering.

pam.sparse

Attempt to use a sparse matrix for PAM centroids. See details.

version

Which version of partitional/fuzzy clustering to use. See details.

method

Character vector with one or more linkage methods to use in hierarchical procedures (see stats::hclust()), the character "all" to use all of the available ones, or a function that performs hierarchical clustering based on distance matrices (e.g. cluster::diana()). See details.

fuzziness

Numeric. Exponent used for fuzzy clustering. Commonly termed m in the literature.

delta

Numeric. Convergence criterion for fuzzy clustering.

dc

The cutoff distance for the TADPole algorithm.

window.size

The window.size specifically for the TADPole algorithm.

lb

The lower bound to use with TADPole. Either "lbk" or "lbi".

preproc

A list of arguments for a preprocessing function to be used in tsclust().

dist

A list of arguments for a distance function to be used in tsclust().

cent

A list of arguments for a centroid function to be used in tsclust().

Details

The functions essentially return their function arguments in a classed list, although some checks are performed.

Regarding parameter version: the first version of partitional/fuzzy clustering implemented in the package always performed an extra iteration, which is unnecessary. Use version 1 to mimic this previous behavior.

Partitional

When pam.precompute = FALSE, using pam.sparse = TRUE defines a sparse matrix (refer to Matrix::sparseMatrix()) and updates it every iteration (except for "dtw_lb" distance). For most cases, precomputing the whole distance matrix is still probably faster. See the timing experiments in browseVignettes("dtwclust").

Parallel computations for PAM centroids have the following considerations:

  • If pam.precompute is TRUE, both distance matrix calculations and repetitions are done in parallel, regardless of pam.sparse.

  • If pam.precompute is FALSE and pam.sparse is TRUE, repetitions are done sequentially, so that the distance calculations can be done in parallel and the sparse matrix updated iteratively.

  • If both pam.precompute and pam.sparse are FALSE, repetitions are done in parallel, and each repetition performs distance calculations sequentially, but the distance matrix cannot be updated iteratively.

Hierarchical

There are some limitations when using a custom hierarchical function in method: it will receive the lower triangular of the distance matrix as first argument (see stats::as.dist()) and the result should support the stats::as.hclust() generic. This functionality was added with the cluster package in mind, since its functions follow this convention, but other functions could be used if they are adapted to work similarly.

TADPole

When using TADPole, the dist argument list includes the window.size and specifies norm = "L2".


dtwclust documentation built on Sept. 11, 2024, 9:07 p.m.