compare_caps: Generates results of multiple clustering strategies

View source: R/compare-caps.R

compare_capsR Documentation

Generates results of multiple clustering strategies

Description

This function searches for clusters in the input data set using different strategies and generates an object of class mcaps which stores multiple objects of class caps. This is a helper function to facilitate comparison of clustering methods and choice of an optimal one.

Usage

compare_caps(
  x,
  y,
  n_clusters = 1:5,
  metric = c("l2", "pearson"),
  clustering_method = c("kmeans", "hclust-complete", "hclust-average", "hclust-single",
    "dbscan"),
  warping_class = c("affine", "dilation", "none", "shift", "srsf"),
  centroid_type = c("mean", "medoid", "median", "lowess", "poly"),
  cluster_on_phase = FALSE
)

Arguments

x

A numeric vector of length M or a numeric matrix of shape N \times M or an object of class funData::funData. If a numeric vector or matrix, it specifies the grid(s) of size M on which each of the N curves have been observed. If an object of class funData::funData, it contains the whole functional data set and the y argument is not used.

y

Either a numeric matrix of shape N \times M or a numeric array of shape N \times L \times M or an object of class fda::fd. If a numeric matrix or array, it specifies the N-sample of L-dimensional curves observed on grids of size M. If an object of class fda::fd, it contains all the necessary information about the functional data set to be able to evaluate it on user-defined grids.

n_clusters

An integer vector specifying a set of clustering partitions to create. Defaults to 1:5.

metric

A string specifying the metric used to compare curves. Choices are "l2" or "pearson". Defaults to "l2". Used only when warping_class != "srsf". For the boundary-preserving warping class, the L2 distance between the SRSFs of the original curves is used.

clustering_method

A character vector specifying one or more clustering methods to be fit. Choices are "kmeans", "hclust-complete", "hclust-average", "hclust-single" or "dbscan". Defaults to all of them.

warping_class

A character vector specifying one or more classes of warping functions to use for curve alignment. Choices are "affine", "dilation", "none", "shift" or "srsf". Defaults to all of them.

centroid_type

A character vector specifying one or more ways to compute centroids. Choices are "mean", "medoid", "median", "lowess" or "poly". Defaults to all of them.

cluster_on_phase

A boolean specifying whether clustering should be based on phase variation or amplitude variation. Defaults to FALSE which implies amplitude variation.

Value

An object of class mcaps which is a tibble::tibble storing the objects of class caps in correspondence of each combination of possible choices from the input arguments.

Examples

#----------------------------------
# Compare k-means results with k = 1, 2, 3, 4, 5 using mean centroid and
# various warping classes.
## Not run: 
sim30_mcaps <- compare_caps(
  x = simulated30_sub$x,
  y = simulated30_sub$y,
  warping_class = c("none", "shift", "dilation", "affine"),
  clustering_method = "kmeans",
  centroid_type = "mean"
)

## End(Not run)

#----------------------------------
# Then visualize the results
# Either with ggplot2 via ggplot2::autoplot(sim30_mcaps)
# or using graphics::plot()
# You can visualize the WSS values:
plot(sim30_mcaps, validation_criterion = "wss", what = "mean")
plot(sim30_mcaps, validation_criterion = "wss", what = "distribution")
# Or the average silhouette values:
plot(sim30_mcaps, validation_criterion = "silhouette", what = "mean")
plot(sim30_mcaps, validation_criterion = "silhouette", what = "distribution")

fdacluster documentation built on July 9, 2023, 6:45 p.m.