suggest_number_of_clusters: suggest_number_of_clusters

View source: R/suggest_number_of_clusters.R

suggest_number_of_clustersR Documentation

suggest_number_of_clusters

Description

Algorithm establishes the maximum number of cluster based on the lesser of k_limit and the number of unique values in x. A set of kmeans models are created starting with a single cluster and progressing to the maximum number of clusters. For model, the sum of within sum of squares is calculated. Note that kmeans model produces a within sum of squares for k (number of clusters) = 1. If the method is changed from kmeans, it may be necessary to create the sum of squares for k = 1 manually using degrees of freedom * sample variance.

Usage

suggest_number_of_clusters(x, k_limit = 10, diagnostic_file_prefix = NULL)

Arguments

x

vector of numeric values

k_limit

numeric maximum number of clusters to consider

diagnostic_file_prefix

character, if present, a file is output with the wss~cluster number plot. number:wss curve and y = x line.

Details

Both sets of values are scaled from 0 to 1 so that the intersection may be found with the line y = x. The intersection is designated as the knee of the curve commonly used to determine the optimal number of clusters. The distance of each point from the line y = x is calculated and the point closest to the line chosen as the suggested number of clusters.

A diagnostic plot may be produced showing the within sum of squares and cluster number.

Value

numeric

Examples

# suggest_number_of_clusters()

johnaclouse/eeda documentation built on July 22, 2022, 12:16 a.m.