clusterStability: clustering stability function
In Evacluster: Evaluation Clustering Methods for Disease Subtypes Diagnosis

clusterStability

R Documentation

clustering stability function

Description

This function computes the stability of clustering that helps to select the best number of clusters. Feature selection and dimensionality reduction methods can be used before clustering the data.

Usage

clusterStability(
  data = NULL,
  clustermethod = NULL,
  dimenreducmethod = NULL,
  n_components = 3,
  perplexity = 25,
  max_iter = 1000,
  k_neighbor = 3,
  featureselection = NULL,
  outcome = NULL,
  fs.pvalue = 0.05,
  randomTests = 20,
  trainFraction = 0.5,
  pac.thr = 0.1,
  ...
)

Arguments

`data`	A Data set
`clustermethod`	The clustering method. This can be one of "Mclust","pamCluster","kmeansCluster", "hierarchicalCluster",and "FuzzyCluster".
`dimenreducmethod`	The dimensionality reduction method. This must be one of "UMAP","tSNE", and "PCA".
`n_components`	The dimension of the space that data embed into. It can be set to any integer value in the range of 2 to 100.
`perplexity`	The Perplexity parameter that determines the optimal number of neighbors in tSNE method.(it is only used in the tSNE reduction method)
`max_iter`	The maximum number of iterations for performing tSNE reduction method.
`k_neighbor`	The k_neighbor is used for computing the means of #neighbors with min distance (#Neighbor=sqrt(#Samples/k) for performing an embedding of new data using an existing embedding in the tSNE method.
`featureselection`	This parameter determines whether feature selection is applied before clustering data or not. if used, it should be "yes", otherwisw "no".
`outcome`	The outcome feature is used for feature selection.
`fs.pvalue`	The threshold pvalue used for feature selection process. The default value is 0.05.
`randomTests`	The number of iterations of the clustering process for computing the cluster stability.
`trainFraction`	This parameter determines the ratio of training data. The default value is 0.5.
`pac.thr`	The pac.thr is the thresold to use for computing the proportion of ambiguous clustering (PAC) score. It is as the fraction of sample pairs with consensus indices falling in the interval.The default value is 0.1.
`...`	Additional arguments passed to clusterStability().

Value

A list with the following elements:

randIndex - A vector of the Rand Index that computes a similarity measure between two clusterings.
jaccIndex - A vector of jaccard Index that measures how frequently pairs of items are joined together in two clustering data sets.
randomSamples - A vector with indexes of selected samples for training in each iteration.
clusterLabels - A vector with clusters' labels in all iterations. jaccardpoint
jaccardpoint - The corresponding Jaccard index for each data point of testing set
averageNumberofClusters - The mean Number of Clusters.
testConsesus - A vector of consensus clustering results of testing set.
trainRandIndex - A vector of the Rand Index for training set.
trainJaccIndex - A vector of the jaccard Index for training set.
trainJaccardpoint - The corresponding Jaccard index for each data point of training set.
PAC - The proportion of ambiguous clustering (PAC) score.
dataConcensus - A vector of consensus clustering results of training set.

Examples


library("mlbench")
data(Sonar)

Sonar$Class <- as.numeric(Sonar$Class)
Sonar$Class[Sonar$Class == 1] <- 0 
Sonar$Class[Sonar$Class == 2] <- 1

ClustStab <- clusterStability(data=Sonar, clustermethod=kmeansCluster, dimenreducmethod="UMAP",
                              n_components = 3,featureselection="yes", outcome="Class",
                              fs.pvalue = 0.05,randomTests = 100,trainFraction = 0.7,center=3)


ClustStab <- clusterStability(data=Sonar, clustermethod=pamCluster, dimenreducmethod="tSNE",
                              n_components = 3, perplexity=10,max_iter=100,k_neighbor=2,
                              featureselection="yes", outcome="Class",fs.pvalue = 0.05,
                              randomTests = 100,trainFraction = 0.7,k=3)


ClustStab <- clusterStability(data=Sonar, clustermethod=hierarchicalCluster, 
                              dimenreducmethod="PCA", n_components = 3,featureselection="no",
                              randomTests = 100,trainFraction = 0.7,distmethod="euclidean",
                              clusters=3)

Evacluster documentation built on April 1, 2022, 9:07 a.m.