find.silhouette | R Documentation |
Compare the silhouette coefficient for a given number of clusters against other cluster numbers to determine the optimal number of clustes and against randomized trials to determine how the silhouette coefficients of the identified clusters compare to those of randomized data.
find.silhouette(data,to.sil = "samples", to.view = "rand.to.clust", ngroups,
maxgroups = 12, max.iter = 10, method = "pearson",
NA.handling = "pairwise.complete.obs", linkage = "complete",
main = "Average Silhouette Width",
axis.label = "Silhouette Width", main.label.size = 30, axis.label.size = 20,
legend.position = "bottom")
data |
numeric data matrix with samples/observations in the columns and genes/variables in the rows |
to.sil |
whether the clustering and comparison will be performed on the columns ("samples", default) or the rows ("genes") |
to.view |
there are three options for which comparisons to show. "rand.to.clust" (the default) will show a density plot comparing the clustering of the provided data for ngroups clusters to max.iter randomized trials. "all.clusts" will compare the silhouette coefficients for 2:maxgroups clusters for the provided data. "rand.all.clusts" combines the above two options, showing 2:maxgroups clusters for the provided data in addition to max.iter randomized trials for each number of clusters |
ngroups |
number of clusters to compare (when to.view is set to "rand.to.clust") |
maxgroups |
maximum number of clusters to compare (when to.view is set to "all.clust" or "rand.all.clusts"). Will test all numbers of clusters from 2:maxgroups |
max.iter |
number of randomized trials to compare against (when to.view is set to "rand.to.clust" or "rand.all.clusts") |
method |
method to be used to calculate distance matrix for clustering. Accepts values to be passed to cor() such as "pearson" (default), "spearman", and "kendall" which well then be coerced to a distance matrix or any options accepted by dist() |
NA.handling |
how missing values should be handled in the case of correlations, passed to the "use" argument of cor() |
linkage |
linkage clustering method used for clustering and to be passed to hclust(). Accepts all methods accepted by hclust() |
main |
title for generated plot, default is "Average Silhouette Width" |
axis.label |
axis label for the silhouette widt, default is "Silhouette Width" |
main.label.size |
size of plot title |
axis.label.size |
size of axis title |
legend.position |
position of legend when to.view is set to "rand.all.clusts" |
Assesses the sihouette coefficient of clustered data using correlation or distance metrics to be passed to hclust which is then split based on cutree, corresponds to same specifications of a heatmap with the same inputs using myHeatmap or myHeatmapByAnnotation.
A ggplot object. Additional layers can be added to the returned ggplot object to further customize theme and aesthetics.
~~Alison Moss~~
##initiate_parameters
initiate_params()
##view heatmap of data
myHeatmap(RAGP_norm)
###use find.silhouette to help determine optimal number of clusters
##compare a given cluster number to the same cluster number for randomized data
find.silhouette(RAGP_norm, ngroups = 6, to.sil = "genes", to.view = "rand.to.clust")
find.silhouette(RAGP_norm, ngroups = 6, to.sil = "genes", to.view = "rand.to.clust", max.iter = 20)
find.silhouette(RAGP_norm, ngroups = 6, to.sil = "genes", to.view = "rand.to.clust", max.iter = 100)
##for the data provided, determine optimal number of clusters
find.silhouette(RAGP_norm, to.sil = "genes", to.view = "all.clusts", maxgroups = 12)
find.silhouette(RAGP_norm, to.sil = "genes", to.view = "all.clusts", maxgroups = 15)
##for data provided, compare multiple cluster options to randomized data
find.silhouette(RAGP_norm, to.sil = "genes", to.view = "rand.all.clusts", maxgroups = 12)
find.silhouette(RAGP_norm, to.sil = "genes", to.view = "rand.all.clusts",
maxgroups = 9, max.iter = 100)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.