View source: R/silhouette_analysis.R
silhouette_analysis | R Documentation |
Silhouette analysis identifies the number of clusters that have highest explanatory power. It tries to answer the question of how many different clusters are required to optimally separate all clusters from their neighbors. Good cluster separation results in a higher average silhouette width, the decisive metric to judge cluster number. This function applies silhouette analysis iteratively for a vector of different cluster numbers and stores the result in a list.
silhouette_analysis( mat, cluster_object = NULL, n_clusters = 2:10, n_repeats = 5, plot = TRUE )
mat |
(numeric matrix) data matrix that clustering was performed on (or will be performed using k-means clustering) |
cluster_object |
(hclust) a cluster object obtained from running hclust(), optional |
n_clusters |
(numeric) a vector of cluster numbers for which silhouette analysis is performed |
n_repeats |
(numeric) scalar indicating the number of random permutations to perform analysis (default: 5) |
plot |
(logical) if the function should return a list of summary plots also. Default is TRUE |
Prerequesite for silhouette analysis is a cluster object that can be obtained by e.g. running hclust(d = dist(mat), method = "ward.D"). The alternative is to supply no cluster object, then the function performs a kmeans() clustering for the indicated number of clusters.
A list with five objects
data
: silhouette analysis data for each iteration,
data_summary
: silhouette analysis data concise summary,
optimal_n_clust
: optimal number of clusters,
plot_clusters
: plot silhouette widths for all number of clusters separately,
plot_summary
: plot silhouette widths summary
# generate a random matrix that we use for clustering with the # format of 100 rows (e.g. determined gene expression) and 10 # columns (conditions) mat <- matrix(rnorm(1000), ncol = 10) # we can perform clustering on this matrix using e.g. hclust: # there is clearly no good separation between different clusters of 'genes' clust <- hclust(dist(mat)) plot(clust) # perform silhouette analysis for 2 to 10 different clusters sil_result <- silhouette_analysis(mat, n_clusters = 2:10) # plot results print(sil_result$plot_clusters, split = c(1,1,2,1), more = TRUE) print(sil_result$plot_summary, split = c(2,1,2,1))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.