ChooseK: Cluster Number Selection

View source: R/08_Clustering.R

ChooseKR Documentation

Cluster Number Selection

Description

Function to choose the number of clusters k. Examines cluster numbers between k0 and k1. For each cluster number, generates boot bootstrap data sets, fits the Gaussian Mixture Model (FitGMM), and calculates quality metrics (ClustQual). For each metric, determines the optimal cluster number k_opt, and the k_1SE, the smallest cluster number whose quality is within 1 SE of the optimum.

Usage

ChooseK(
  data,
  k0 = 2,
  k1 = NULL,
  boot = 100,
  init_means = NULL,
  fix_means = FALSE,
  init_covs = NULL,
  init_props = NULL,
  maxit = 10,
  eps = 1e-04,
  report = TRUE
)

Arguments

data

Numeric data matrix.

k0

Minimum number of clusters.

k1

Maximum number of clusters.

boot

Bootstrap replicates.

init_means

Optional list of initial mean vectors.

fix_means

Fix the means to their starting value? Must provide initial values.

init_covs

Optional list of initial covariance matrices.

init_props

Optional vector of initial cluster proportions.

maxit

Maximum number of EM iterations.

eps

Minimum acceptable increment in the EM objective.

report

Report bootstrap progress?

Value

List containing Choices, the recommended number of clusters according to each quality metric, and Results, the mean and standard error of the quality metrics at each cluster number evaluated.

See Also

See ClustQual for evaluating cluster quality, and FitGMM for estimating the GMM with a specified cluster number.

Examples


set.seed(100)
mean_list <- list(c(2, 2), c(2, -2), c(-2, 2), c(-2, -2))
data <- rGMM(n = 500, d = 2, k = 4, means = mean_list)
choose_k <- ChooseK(data, k0 = 2, k1 = 6, boot = 10)
choose_k$Choices


MGMM documentation built on Sept. 30, 2023, 5:06 p.m.