mapper.sta: Mapper function with multiple cluster methods
In TianshuFeng/SemiMapper: Semi-supervised Topological Analysis

mapper.sta

R Documentation

Mapper function with multiple cluster methods

Description

This function is adopted from mapper function of TDAmapper with different clustering methods (mainly k-means).

Usage

mapper.sta(
  dat,
  filter_values,
  num_intervals,
  percent_overlap,
  dist_method = "euclidean",
  cluster_method = "kmeans",
  NbClust_cluster_method = "kmeans",
  num_bins_when_clustering = 10,
  cluster_index = "all",
  n_class = 0,
  eps = 0.15,
  minPts = 5,
  permute_interval_level = FALSE,
  ...
)

Arguments

`dat`	Matrix or dataset where rows are data points and columns are predictive variables.
`filter_values`	A n x m data frame of real numbers returned by the filter functions.
`dist_method`	The distance measure to be used to compute the dissimilarity matrix. By default, distance="euclidean". It must be one of This must be one of: "euclidean", "maximum", "manhattan", "canberra", "binary", "minkowski" or "NULL". Details can be found in `NbClust`.
`cluster_method`	Clustering method. This should be one of: "hierarchical", "kmeans", "dbscan", "hdbscan".
`NbClust_cluster_method`	The cluster analysis method to be used. This should be one of: "ward.D", "ward.D2", "single", "complete", "average", "mcquitty", "median", "centroid", "kmeans".Details can be found in `NbClust`.
`num_bins_when_clustering`	For hierachical clustering. A positive integer that controls whether points in the same level set end up in the same cluster.
`cluster_index`	The index to be calculated. This should be one of : "kl", "ch", "hartigan", "ccc", "scott", "marriot", "trcovw", "tracew", "friedman", "rubin", "cindex", "db", "silhouette", "duda", "pseudot2", "beale", "ratkowsky", "ball", "ptbiserial", "gap", "frey", "mcclain", "gamma", "gplus", "tau", "dunn", "hubert", "sdindex", "dindex", "sdbw", "all" (all indices except GAP, Gamma, Gplus and Tau), "alllong" (all indices with Gap, Gamma, Gplus and Tau included). Details can be found in `NbClust`.
`n_class`	number of clusters for k means. By default, n_class=0. If n_class>0, this function will instead call `kmeans` and pass `n_class` to argument `centers` of `kmeans`.
`eps`	for DBSCAN, size of the epsilon neighborhood
`minPts`	for DBSCAN and HDBSCAN, number of minimum points in the eps region for core points. Default is 2 points
`permute_interval_level`	boolean. True if samples within each interval are to be permuted
`...`	Further arguments for `NbClust` or `kmeans` or `hclust` or `dbscan` or `hdbscan`

Details

This function is adopted from mapper function of TDAmapper by replacing its cluster method with the cluster function NbClust from R package NbClust.

The advantage of NbClust is that it provides users with 8 different cluster methods, 6 different distance measures and 30 indices for determining the number of clusters. This allows users to select the best clustering scheme from the different results obtained by varying all combinations of number of clusters, distance measures, and clustering methods. Details of the distance measures, clustering methods and cluster indices can be found in NbClust.

Value

An object of class TDAmapper which is a list of items named adjacency (adjacency matrix for the edges), num_vertices (integer number of vertices), level_of_vertex (vector with level_of_vertex[i] = index of the level set for vertex i), points_in_vertex (list with points_in_vertex[[i]] = vector of indices of points in vertex i), points_in_level (list with points_in_level[[i]] = vector of indices of points in level set i, and vertices_in_level (list with vertices_in_level[[i]] = vector of indices of vertices in level set i.

References

Malika Charrad, Nadia Ghazzali, Veronique Boiteau, Azam Niknafs (2014). NbClust: An R Package for Determining the Relevant Number of Clusters in a Data Set. Journal of Statistical Software, 61(6), 1-36. URL http://www.jstatsoft.org/v61/i06/.

Examples

tp_data = chicken_generator(1)
tp_data_mapper = mapper.sta(dat = tp_data[,2:4],
                               filter_values = tp_data$Y,
                               num_intervals = 10,
                               percent_overlap = 70)

TianshuFeng/SemiMapper documentation built on Sept. 16, 2022, 10:26 p.m.