chooseK: Cluster Number Selection
In zrmacc/MNMix: Missingness Aware Gaussian Mixture Models

ChooseK

R Documentation

Cluster Number Selection

Description

Function to choose the number of clusters k. Examines cluster numbers between k0 and k1. For each cluster number, generates boot bootstrap data sets, fits the Gaussian Mixture Model (FitGMM), and calculates quality metrics (ClustQual). For each metric, determines the optimal cluster number k_opt, and the k_1SE, the smallest cluster number whose quality is within 1 SE of the optimum.

Usage

ChooseK(
  data,
  k0 = 2,
  k1 = NULL,
  boot = 100,
  init_means = NULL,
  fix_means = FALSE,
  init_covs = NULL,
  lambda = 0,
  init_props = NULL,
  maxit = 10,
  eps = 1e-04,
  report = TRUE
)

Arguments

`data`	Numeric data matrix.
`k0`	Minimum number of clusters.
`k1`	Maximum number of clusters.
`boot`	Bootstrap replicates.
`init_means`	Optional list of initial mean vectors.
`fix_means`	Fix the means to their starting value? Must provide initial values.
`init_covs`	Optional list of initial covariance matrices.
`lambda`	Optional ridge term added to covariance matrix to ensure positive definiteness.
`init_props`	Optional vector of initial cluster proportions.
`maxit`	Maximum number of EM iterations.
`eps`	Minimum acceptable increment in the EM objective.
`report`	Report bootstrap progress?

Value

List containing Choices, the recommended number of clusters according to each quality metric, and Results, the mean and standard error of the quality metrics at each cluster number evaluated.

Examples


set.seed(100)
mean_list <- list(c(2, 2), c(2, -2), c(-2, 2), c(-2, -2))
data <- rGMM(n = 500, d = 2, k = 4, means = mean_list)
choose_k <- ChooseK(data, k0 = 2, k1 = 6, boot = 10)
choose_k$Choices

zrmacc/MNMix documentation built on July 3, 2024, 7:48 p.m.