EHyClus: Clustering using Epigraph and Hypograph indices

View source: R/EHyClus.R

EHyClusR Documentation

Clustering using Epigraph and Hypograph indices


It creates a multivariate dataset containing the epigraph, hypograph and/or its modified versions on the curves and derivatives and then perform hierarchical clustering, kmeans, kernel kmeans, and spectral clustering


  k = 30,
  n_clusters = 2,
  bs = "cr",
  clustering_methods = c("hierarch", "kmeans", "kkmeans", "spc"),
  l_method_hierarch = c("single", "complete", "average", "centroid", "ward.D2"),
  l_dist_hierarch = c("euclidean", "manhattan"),
  l_dist_kmeans = c("euclidean", "mahalanobis"),
  l_kernel = c("rbfdot", "polydot"),
  true_labels = NULL,
  only_best = FALSE,
  verbose = FALSE,
  n_cores = 1



Dataset containing the curves to apply a clustering algorithm. The functional dataset can be one dimensional (n \times p) where n is the number of curves and p the number of time points, or multidimensional (n \times p \times q) where q represents the number of dimensions in the data


If list, each element of the list should be an atomic vector of strings with the names of the variables. Combinations with non-valid variable names will be discarded. If the list is non-named, the names of the variables are set to vars1, ..., varsk, where k is the number of elements in vars_combinations. If not provided, generic combinations of variables will be used. They will not be the same for uni-dimensional and multi-dimensional problems.


Number of basis functions for the B-splines. If equals to 0, the number of basis functions will be automatically selected.


Number of clusters to generate.


A two letter character string indicating the (penalized) smoothing basis to use. See smooth.terms.


character vector specifying at least one of the following clustering methods to be computed: "hierarch", "kmeans", "kkmeans" or "spc".


list of clustering methods for hierarchical clustering.


list of distances for hierarchical clustering.


list of distances for kmeans clustering.


list of kernels for kkmeans or spc.


Atomic vector of type numeric with two elements: the lower limit and the upper limit of the evaluation grid. If not provided, it will be selected automatically.


Numeric vector of true labels for validation. If provided, evaluation metrics are computed in the final result.


logical value. If TRUE and true_labels is provided, the function will return only the result for the best clustering method based on the Rand Index. Defaults to FALSE.


If TRUE, the function will print logs for about the execution of some clustering methods. Defaults to FALSE.


Number of cores to do parallel computation. 1 by default, which mean no parallel execution. Must be an integer number greater than 1.


A list containing the clustering partition for each method and indices combination and, if true_labels is provided a data frame containing the time elapsed for obtaining a clustering partition of the indices dataset for each methodology. Also, the number of generated clusters and the combinations of variables used can be seen as attributes of this object.


# univarariate data without labels
curves <- sim_model_ex1(n = 10)
vars_combinations <- list(c("dtaEI", "dtaMEI"), c("dtaHI", "dtaMHI"))
EHyClus(curves, vars_combinations = vars_combinations)

# multivariate data with labels
curves <- sim_model_ex2(n = 5)
true_labels <- c(rep(1, 5), rep(2, 5))
vars_combinations <- list(c("dtaMEI", "ddtaMEI"), c("dtaMEI", "d2dtaMEI"))
res <- EHyClus(curves, vars_combinations = vars_combinations, true_labels = true_labels)
res$cluster # clustering results

# multivariate data and generic (default) vars_combinations
curves <- sim_model_ex2(n = 5)

ehymet documentation built on June 22, 2024, 10:50 a.m.