View source: R/suggest_number_of_clusters.R
suggest_number_of_clusters | R Documentation |
Algorithm establishes the maximum number of cluster based on the lesser of k_limit and the number of unique values in x. A set of kmeans models are created starting with a single cluster and progressing to the maximum number of clusters. For model, the sum of within sum of squares is calculated. Note that kmeans model produces a within sum of squares for k (number of clusters) = 1. If the method is changed from kmeans, it may be necessary to create the sum of squares for k = 1 manually using degrees of freedom * sample variance.
suggest_number_of_clusters(x, k_limit = 10, diagnostic_file_prefix = NULL)
x |
vector of numeric values |
k_limit |
numeric maximum number of clusters to consider |
diagnostic_file_prefix |
character, if present, a file is output with the wss~cluster number plot. number:wss curve and y = x line. |
Both sets of values are scaled from 0 to 1 so that the intersection may be found with the line y = x. The intersection is designated as the knee of the curve commonly used to determine the optimal number of clusters. The distance of each point from the line y = x is calculated and the point closest to the line chosen as the suggested number of clusters.
A diagnostic plot may be produced showing the within sum of squares and cluster number.
numeric
# suggest_number_of_clusters()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.