k_select: Plots for helping decide number of clusters

View source: R/kmeans.R

k_selectR Documentation

Plots for helping decide number of clusters

Description

To help decide the number of cluster, three different methods are provided: total within cluster sum of squares, average silhouette coefficient, and gap statistics.

Usage

k_select(
  musica,
  model_name,
  modality = "SBS96",
  result_name = "result",
  method = "wss",
  clust.method = "kmeans",
  n = 10,
  proportional = TRUE
)

Arguments

musica

A musica object containing a mutational discovery or prediction. A two-dimensional UMAP has to be stored in this object.

model_name

The name of the desired model.

modality

The modality of the model. Must be "SBS96", "DBS78", or "IND83". Default "SBS96".

result_name

Name of the result list entry containing desired model. Default "result".

method

A single character string indicating which statistic to use for plot. Options are "wss" (total within cluster sum of squares), "silhouette" (average silhouette coefficient), and "gap_stat" (gap statistic). Default is "wss".

clust.method

A character string indicating clustering method. Options are "kmeans" (default), "hclust" (hierarchical clustering), "hkmeans", "pam", and "clara".

n

An integer indicating maximum number of clusters to test. Default is 10.

proportional

Logical, indicating if proportional exposure (default) will be used for clustering.

Value

A ggplot object.

See Also

fviz_nbclust

Examples

data(res_annot)
set.seed(123)
# Make an elbow plot
k_select(res_annot, model_name = "res_annot", method = "wss", n = 6)
# Plot average silhouette coefficient against number of clusters
k_select(res_annot, model_name = "res_annot", method = "silhouette", n = 6)
# Plot gap statistics against number of clusters
k_select(res_annot, model_name = "res_annot", method = "gap_stat", n = 6)

campbio/musicatk documentation built on Dec. 25, 2024, 9:34 p.m.