plotClusteringMetrics: Plot clustering metrics

View source: R/EMbasic.R

plotClusteringMetricsR Documentation

Plot clustering metrics

Description

Diagnostic plots for choosing the optimal number of clusters. Based on the following: https://www.datanovia.com/en/lessons/determining-the-optimal-number-of-clusters-3-must-know-methods/. Using three metrics: 1) average silhouette width. Bigger is better 2) Within cluster sum of squares (elbowWSS) - smaller is better, look for where slope no longer changes much ("elbow" in graph) 3) Gap statistic which compares how much smaller the sample WSS is from a random distribution of WSS generated from randomised matrices. Bigger is better. Look for kink in graph.

Usage

plotClusteringMetrics(
  dataMatrix,
  k_range = 2:8,
  maxB = 100,
  convergenceError = 1e-06,
  maxIterations = 100,
  outPath = ".",
  outFileBase = "",
  EMrep = NULL,
  nThreads = 1,
  setSeed = FALSE,
  distMetric = list(name = "euclidean", rescale = F)
)

Arguments

dataMatrix

A matrix of methylation or bincount values (reads x position)

k_range

A vector indicating different numbers of classes to learn

maxB

The maximum number of randomisations to perform

convergenceError

An float indicating the convergence threshold for stopping iteration

maxIterations

An integer indicating the max number of iterations to perform even if the algorithm has not converged

outPath

A string with the path to the directory where the output should go

outFileBase

A string that will be used in the filenames and titles of the plots produced (default is "")

EMrep

An integer indicating which EM repeat this is

nThreads

Number of threads to use for generating background distribution (default is 1)

setSeed

Logical value to determine if seed should be set for randomisation (default is FALSE)

distMetric

A list with the name of the distance metric and any parameters it might require

Value

None


jsemple19/EMclassifieR documentation built on Aug. 12, 2022, 2:57 p.m.