Description Usage Arguments Details Value Examples
Designed to assist users who wish to employ SOM as a clustering tool. Applies standard approaches to assist with identification of grouping structure in multivariate data.
1 |
x |
is a dataframe object |
kmx |
user specified maximum number of clusters/groups to examine. Default is 10. |
itermax |
maximum number of iterations allowed for kmeans. Default is 500*k. |
nstarts |
number of random initializatons for kmeans to employ. Default is 5 |
symsize |
sets symbol size on plots |
Many unsupervised learning algorithms (e.g., SOM, kmeans) require the number of groupings for the algorithm to seek out as a user input. This tool assists users with this decision using two traditional strategies often applied in cluster analysis. Understanding patterns in multivariate data can be assissted by low-dimensional visualization that seek to represent similarity of individual observations in a dataset. Here, we employ multi-dimensional scaling (MDS) to construct a 2-D mapping that projects the pairwise distances among a set of observations into a configuration of points mapped onto abstract coordinate space. Here, we employ MDS as an ordination technique in order visualize information within the data's distance matrix. Similar objects are closer in space and thus multiple isolated regions of high-density will be presented if clustering is obvious. Second, multiple applications of k-means are used to internally assess how grouping structure changes as a function of the number of clusters. Results are presented as a scree plot based on the total within cluster sum-of-squares (WCSS) for each data partition. An ideal plot will present clear 'elbowing', where the measure decreases more slowly as the number of groupings increases.
MDS is implemented via cmdscale() and k-means employs kmeans() via the stats package.
Panel a illustrates multi-dimensional scaling (MDS) results. Panel b presents a scree plot of k-means resuts. A dataframe with cluster/group statistics is also returned.
K Number of Clusters
WCSS Total Within-Cluster Sum-of-Squares. Lower values reflect clusters with less internal variability.
BCSS Between-Cluster Sum-of-Squares. Higher values reflect clusterings with more distinction.
WB_Ratio Presents the ratio between WCSS and BCSS sum-of-squares. Values below 1 are desired as they reflect clusterings where the within-cluster varibility is lower than the between-cluster varibility.
CH the Calinski-Harabasz is a sum-of-squares based clustering statistic. In brief, the index incorporates the WB ratio but penalizes by cluster number. Higher values reflect better 'clustering'.
SW the average silhoutte width. The silhouette ranges from −1 to +1, where a high value indicates that the objects are well matched to its own cluster and poorly matched to neighboring clusters. If the value is a high value (near +1), then the clustering configuration is appropriate. If the average is a low or negative value, then the clustering configuration may have too many or too few clusters.
ADJ_R2 the adjusted R2 for the clustering model. Higher values indicate that the clustering
1 2 3 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.