plotClusteringMetrics | R Documentation |
Diagnostic plots for choosing the optimal number of clusters. Based on the following: https://www.datanovia.com/en/lessons/determining-the-optimal-number-of-clusters-3-must-know-methods/. Using three metrics: 1) average silhouette width. Bigger is better 2) Within cluster sum of squares (elbowWSS) - smaller is better, look for where slope no longer changes much ("elbow" in graph) 3) Gap statistic which compares how much smaller the sample WSS is from a random distribution of WSS generated from randomised matrices. Bigger is better. Look for kink in graph.
plotClusteringMetrics( dataMatrix, k_range = 2:8, maxB = 100, convergenceError = 1e-06, maxIterations = 100, outPath = ".", outFileBase = "", EMrep = NULL, nThreads = 1, setSeed = FALSE, distMetric = list(name = "euclidean", rescale = F) )
dataMatrix |
A matrix of methylation or bincount values (reads x position) |
k_range |
A vector indicating different numbers of classes to learn |
maxB |
The maximum number of randomisations to perform |
convergenceError |
An float indicating the convergence threshold for stopping iteration |
maxIterations |
An integer indicating the max number of iterations to perform even if the algorithm has not converged |
outPath |
A string with the path to the directory where the output should go |
outFileBase |
A string that will be used in the filenames and titles of the plots produced (default is "") |
EMrep |
An integer indicating which EM repeat this is |
nThreads |
Number of threads to use for generating background distribution (default is 1) |
setSeed |
Logical value to determine if seed should be set for randomisation (default is FALSE) |
distMetric |
A list with the name of the distance metric and any parameters it might require |
None
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.