mapper.kmeans: Mapper function with multiple cluster methods (deprecated)

View source: R/mapper.R

mapper.kmeansR Documentation

Mapper function with multiple cluster methods (deprecated)

Description

This function is adopted from mapper function of TDAmapper with different clustering methods (mainly k-means).

Usage

mapper.kmeans(
  dat,
  filter_values,
  num_intervals,
  percent_overlap,
  dist_method = "euclidean",
  cluster_method = "kmeans",
  NbClust_cluster_method = "kmeans",
  num_bins_when_clustering = 10,
  cluster_index = "all",
  n_class = 0,
  eps = 0.5,
  minPts = 2,
  permute_interval_level = FALSE,
  ...
)

Arguments

dat

Matrix or dataset where rows are data points and columns are predictive variables.

filter_values

A n x m data frame of real numbers returned by the filter functions.

dist_method

The distance measure to be used to compute the dissimilarity matrix. By default, distance="euclidean". It must be one of This must be one of: "euclidean", "maximum", "manhattan", "canberra", "binary", "minkowski" or "NULL". Details can be found in NbClust.

cluster_method

Clustering method. This should be one of: "hierarchical", "kmeans", "dbscan", "hdbscan".

NbClust_cluster_method

The cluster analysis method to be used. This should be one of: "ward.D", "ward.D2", "single", "complete", "average", "mcquitty", "median", "centroid", "kmeans".Details can be found in NbClust.

num_bins_when_clustering

For hierachical clustering. A positive integer that controls whether points in the same level set end up in the same cluster.

cluster_index

The index to be calculated. This should be one of : "kl", "ch", "hartigan", "ccc", "scott", "marriot", "trcovw", "tracew", "friedman", "rubin", "cindex", "db", "silhouette", "duda", "pseudot2", "beale", "ratkowsky", "ball", "ptbiserial", "gap", "frey", "mcclain", "gamma", "gplus", "tau", "dunn", "hubert", "sdindex", "dindex", "sdbw", "all" (all indices except GAP, Gamma, Gplus and Tau), "alllong" (all indices with Gap, Gamma, Gplus and Tau included). Details can be found in NbClust.

n_class

number of clusters for k means. By default, n_class=0. If n_class>0, this function will instead call kmeans and pass n_class to argument centers of kmeans.

eps

for DBSCAN, size of the epsilon neighborhood

minPts

for DBSCAN and HDBSCAN, number of minimum points in the eps region for core points. Default is 2 points

permute_interval_level

boolean. True if samples within each interval are to be permuted

...

Further arguments for NbClust or kmeans or hclust or dbscan or hdbscan

Details

This function is adopted from mapper function of TDAmapper by replacing its cluster method with the cluster function NbClust from R package NbClust.

The advantage of NbClust is that it provides users with 8 different cluster methods, 6 different distance measures and 30 indices for determining the number of clusters. This allows users to select the best clustering scheme from the different results obtained by varying all combinations of number of clusters, distance measures, and clustering methods. Details of the distance measures, clustering methods and cluster indices can be found in NbClust.

Value

An object of class TDAmapper which is a list of items named adjacency (adjacency matrix for the edges), num_vertices (integer number of vertices), level_of_vertex (vector with level_of_vertex[i] = index of the level set for vertex i), points_in_vertex (list with points_in_vertex[[i]] = vector of indices of points in vertex i), points_in_level (list with points_in_level[[i]] = vector of indices of points in level set i, and vertices_in_level (list with vertices_in_level[[i]] = vector of indices of vertices in level set i.

References

Malika Charrad, Nadia Ghazzali, Veronique Boiteau, Azam Niknafs (2014). NbClust: An R Package for Determining the Relevant Number of Clusters in a Data Set. Journal of Statistical Software, 61(6), 1-36. URL http://www.jstatsoft.org/v61/i06/.

Examples

tp_data = chicken_generator(1)
tp_data_mapper = mapper.kmeans(dat = tp_data[,2:4],
                               filter_values = tp_data$Y,
                               num_intervals = 10,
                               percent_overlap = 70)


TianshuFeng/SemiMapper documentation built on Sept. 16, 2022, 10:26 p.m.