clusterStability: clustering stability function

View source: R/clusterStability.r

clusterStabilityR Documentation

clustering stability function

Description

This function computes the stability of clustering that helps to select the best number of clusters. Feature selection and dimensionality reduction methods can be used before clustering the data.

Usage

clusterStability(
  data = NULL,
  clustermethod = NULL,
  dimenreducmethod = NULL,
  n_components = 3,
  perplexity = 25,
  max_iter = 1000,
  k_neighbor = 3,
  featureselection = NULL,
  outcome = NULL,
  fs.pvalue = 0.05,
  randomTests = 20,
  trainFraction = 0.5,
  pac.thr = 0.1,
  ...
)

Arguments

data

A Data set

clustermethod

The clustering method. This can be one of "Mclust","pamCluster","kmeansCluster", "hierarchicalCluster",and "FuzzyCluster".

dimenreducmethod

The dimensionality reduction method. This must be one of "UMAP","tSNE", and "PCA".

n_components

The dimension of the space that data embed into. It can be set to any integer value in the range of 2 to 100.

perplexity

The Perplexity parameter that determines the optimal number of neighbors in tSNE method.(it is only used in the tSNE reduction method)

max_iter

The maximum number of iterations for performing tSNE reduction method.

k_neighbor

The k_neighbor is used for computing the means of #neighbors with min distance (#Neighbor=sqrt(#Samples/k) for performing an embedding of new data using an existing embedding in the tSNE method.

featureselection

This parameter determines whether feature selection is applied before clustering data or not. if used, it should be "yes", otherwisw "no".

outcome

The outcome feature is used for feature selection.

fs.pvalue

The threshold pvalue used for feature selection process. The default value is 0.05.

randomTests

The number of iterations of the clustering process for computing the cluster stability.

trainFraction

This parameter determines the ratio of training data. The default value is 0.5.

pac.thr

The pac.thr is the thresold to use for computing the proportion of ambiguous clustering (PAC) score. It is as the fraction of sample pairs with consensus indices falling in the interval.The default value is 0.1.

...

Additional arguments passed to clusterStability().

Value

A list with the following elements:

  • randIndex - A vector of the Rand Index that computes a similarity measure between two clusterings.

  • jaccIndex - A vector of jaccard Index that measures how frequently pairs of items are joined together in two clustering data sets.

  • randomSamples - A vector with indexes of selected samples for training in each iteration.

  • clusterLabels - A vector with clusters' labels in all iterations. jaccardpoint

  • jaccardpoint - The corresponding Jaccard index for each data point of testing set

  • averageNumberofClusters - The mean Number of Clusters.

  • testConsesus - A vector of consensus clustering results of testing set.

  • trainRandIndex - A vector of the Rand Index for training set.

  • trainJaccIndex - A vector of the jaccard Index for training set.

  • trainJaccardpoint - The corresponding Jaccard index for each data point of training set.

  • PAC - The proportion of ambiguous clustering (PAC) score.

  • dataConcensus - A vector of consensus clustering results of training set.

Examples


library("mlbench")
data(Sonar)

Sonar$Class <- as.numeric(Sonar$Class)
Sonar$Class[Sonar$Class == 1] <- 0 
Sonar$Class[Sonar$Class == 2] <- 1

ClustStab <- clusterStability(data=Sonar, clustermethod=kmeansCluster, dimenreducmethod="UMAP",
                              n_components = 3,featureselection="yes", outcome="Class",
                              fs.pvalue = 0.05,randomTests = 100,trainFraction = 0.7,center=3)


ClustStab <- clusterStability(data=Sonar, clustermethod=pamCluster, dimenreducmethod="tSNE",
                              n_components = 3, perplexity=10,max_iter=100,k_neighbor=2,
                              featureselection="yes", outcome="Class",fs.pvalue = 0.05,
                              randomTests = 100,trainFraction = 0.7,k=3)


ClustStab <- clusterStability(data=Sonar, clustermethod=hierarchicalCluster, 
                              dimenreducmethod="PCA", n_components = 3,featureselection="no",
                              randomTests = 100,trainFraction = 0.7,distmethod="euclidean",
                              clusters=3)



Evacluster documentation built on April 1, 2022, 9:07 a.m.