tsclusters-methods: Methods for 'TSClusters'

tsclusters-methodsR Documentation

Methods for TSClusters

Description

Methods associated with TSClusters and derived objects.

Usage

## S4 method for signature 'TSClusters'
initialize(.Object, ..., override.family = TRUE)

## S4 method for signature 'TSClusters'
show(object)

## S3 method for class 'TSClusters'
update(object, ..., evaluate = TRUE)

## S4 method for signature 'TSClusters'
update(object, ..., evaluate = TRUE)

## S3 method for class 'TSClusters'
predict(object, newdata = NULL, ...)

## S4 method for signature 'TSClusters'
predict(object, newdata = NULL, ...)

## S3 method for class 'TSClusters'
plot(
  x,
  y,
  ...,
  clus = seq_len(x@k),
  labs.arg = NULL,
  series = NULL,
  time = NULL,
  plot = TRUE,
  type = NULL,
  labels = NULL
)

## S4 method for signature 'TSClusters,missing'
plot(
  x,
  y,
  ...,
  clus = seq_len(x@k),
  labs.arg = NULL,
  series = NULL,
  time = NULL,
  plot = TRUE,
  type = NULL,
  labels = NULL
)

Arguments

.Object

A TSClusters prototype. You shouldn't use this, see Initialize section and the examples.

...

For initialize, any valid slots. For plot, passed to ggplot2::geom_line() for the plotting of the cluster centroids, or to stats::plot.hclust(); see Plotting section and the examples. For update, any supported argument. Otherwise ignored.

override.family

Logical. Attempt to substitute the default family with one that conforms to the provided elements? See Initialize section.

object, x

An object that inherits from TSClusters as returned by tsclust().

evaluate

Logical. Defaults to TRUE and evaluates the updated call, which will result in a new TSClusters object. Otherwise, it returns the unevaluated call.

newdata

New data to be assigned to a cluster. It can take any of the supported formats of tsclust(). Note that for multivariate series, this means that it must be a list of matrices, even if the list has only one matrix.

y

Ignored.

clus

A numeric vector indicating which clusters to plot.

labs.arg

A list with arguments to change the title and/or axis labels. See the examples and ggplot2::labs() for more information.

series

Optionally, the data in the same format as it was provided to tsclust().

time

Optional values for the time axis. If series have different lengths, provide the time values of the longest series.

plot

Logical flag. You can set this to FALSE in case you want to save the ggplot object without printing anything to screen

type

What to plot. NULL means default. See details.

labels

Whether to include labels in the plot (not for dendrogram plots). See details and note that this is subject to randomness.

Details

The update method takes the original function call, replaces any provided argument and optionally evaluates the call again. Use evaluate = FALSE if you want to get the unevaluated call. If no arguments are provided, the object is updated to a new version if necessary (this is due to changes in the internal functions of the package, here for backward compatibility).

Value

The plot method returns a gg object (or NULL for dendrogram plot) invisibly.

Initialize

The initialize method is used when calling methods::new(). The family slot can be substituted with an appropriate one if certain elements are provided by the user. The initialize methods of derived classes also inherit the family and can use it to calculate other slots. In order to get a fully functional object, at least the following slots should be provided:

  • type: "partitional", "hierarchical", "fuzzy" or "tadpole".

  • datalist: The data in one of the supported formats.

  • centroids: The time series centroids in one of the supported formats.

  • cluster: The cluster indices for each series in the datalist.

  • control*: A tsclust-controls object with the desired parameters.

  • distance*: A string indicating the distance that should be used.

  • centroid*: A string indicating the centroid to use (only necessary for partitional clustering).

*Necessary when overriding the default family for the calculation of other slots, CVIs or prediction. Maybe not always needed, e.g. for plotting.

Prediction

The predict generic can take the usual newdata argument. If NULL, the method simply returns the obtained cluster indices. Otherwise, a nearest-neighbor classification based on the centroids obtained from clustering is performed:

  1. newdata is preprocessed with object@family@preproc using the parameters in object@args$preproc.

  2. A cross-distance matrix between the processed series and object@centroids is computed with object@family@dist using the parameters in object@args$dist.

  3. For non-fuzzy clustering, the series are assigned to their nearest centroid's cluster. For fuzzy clustering, the fuzzy membership matrix for the series is calculated. In both cases, the function in object@family@cluster is used.

Plotting

The plot method uses the ggplot2 plotting system (see ggplot2::ggplot()).

The default depends on whether a hierarchical method was used or not. In those cases, the dendrogram is plotted by default; you can pass any extra parameters to stats::plot.hclust() via the ellipsis (...).

Otherwise, the function plots the time series of each cluster along with the obtained centroid. The default values for cluster centroids are: linetype = "dashed", size = 1.5, colour = "black", alpha = 0.5. You can change this by means of the ellipsis (...).

You can choose what to plot with the type parameter. Possible options are:

  • "dendrogram": Only available for hierarchical clustering.

  • "series": Plot the time series divided into clusters without including centroids.

  • "centroids": Plot the obtained centroids only.

  • "sc": Plot both series and centroids

In order to enable labels on the (non-dendrogram) plot, you have to select an option that plots the series and at least provide an empty list in the labels argument. This list can contain arguments for ggrepel::geom_label_repel() and will be passed along. The following are set by the plot method if they are not provided:

  • "mapping": set to aes_string(x = "t", y = "value", label = "label")

  • "data": a data frame with as many rows as series in the datalist and 4 columns:

    • t: x coordinate of the label for each series.

    • value: y coordinate of the label for each series.

    • cl: index of the cluster to which the series belongs (i.e. x@cluster).

    • label: the label for the given series (i.e. names(x@datalist)).

You can provide your own data frame if you want, but it must have those columns and, even if you override mapping, the cl column must have that name. The method will attempt to spread the labels across the plot, but note that this is subject to randomness, so be careful if you need reproducibility of any commands used after plotting (see examples).

If created, the function returns the gg object invisibly, in case you want to modify it to your liking. You might want to look at ggplot2::ggplot_build() if that's the case.

If you want to free the scale of the X axis, you can do the following:

plot(x, plot = FALSE) + facet_wrap(~cl, scales = "free")

For more complicated changes, you're better off looking at the source code at https://github.com/asardaes/dtwclust/blob/master/R/S4-TSClusters-methods.R and creating your own plotting function.

Examples


data(uciCT)

# Assuming this was generated by some clustering procedure
centroids <- CharTraj[seq(1L, 100L, 5L)]
cluster <- unclass(CharTrajLabels)

pc_obj <- new("PartitionalTSClusters",
              type = "partitional", datalist = CharTraj,
              centroids = centroids, cluster = cluster,
              distance = "sbd", centroid = "dba",
              control = partitional_control(),
              args = tsclust_args(cent = list(window.size = 8L, norm = "L2")))

fc_obj <- new("FuzzyTSClusters",
              type = "fuzzy", datalist = CharTraj,
              centroids = centroids, cluster = cluster,
              distance = "sbd", centroid = "fcm",
              control = fuzzy_control())

show(fc_obj)


## Not run: 
plot(pc_obj, type = "c", linetype = "solid",
     labs.arg = list(title = "Clusters' centroids"))

set.seed(15L)
plot(pc_obj, labels = list(nudge_x = -5, nudge_y = 0.2),
     clus = c(1L,4L))

## End(Not run)


dtwclust documentation built on March 7, 2023, 7:49 p.m.