get_opt_hclust: get the optimal hierarchical clustering results with the...

Description Usage Arguments Details Value Examples

View source: R/get_opt_hclust.R

Description

This function is to estimate the optimal number of clusters by combining three indices, including Silhouette index, Calinski-Harabasz (CH) index and height difference.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
get_opt_hclust(
  mat,
  hmethod,
  N.cluster,
  minN.cluster,
  maxN.cluster,
  sil.thre,
  height.Ntimes,
  flashmark
)

Arguments

mat

either a feature matrix or a similarity matrix derived from the single-cell expression matrix

hmethod

agglomeration method for hierarchical clustering, the default is 'ward.D'. Certainly, some other methods can also be used, like 'ward.D2', 'single', 'complete', 'average' (= UPGMA), 'mcquitty' (= WPGMA), 'median' (= WPGMC) or 'centroid' (= UPGMC).

minN.cluster

minimum number of clusters to be tested, the default is 2.

maxN.cluster

maximum number of clusters to be tested, the default is 40 or equal to the number of cells (within the specific clustering problem) minus 1, whichever is smaller.

sil.thre

the threshold for the maximum Silhouette index (msil), the default is 0.35. If msil < sil.thre, we should use CH index.

height.Ntimes

the threshold for the height difference between two adjacent descending-ordered heights obtained after hierarchical clustering. If the height difference is above the threshold, we cut at the median height between the first height and its immediate next which satisfy the criteria.

Details

Specifically, we first select the maximum Silhouette index (msil) as the reference. If msil > threshold (here we use sil.thre as the threshold, the default value is 0.35), then its corresponding number of clusters is the optimal; otherwise, we use the maximum CH index as the reference. If the number of clusters with the maximum CH index is not 2, then it is the optimal number of clusters; otherwise, we use the adjacent height difference (which is derived from hierarchical clustering). If the former height is larger than a threshold (Ntimes larger than the immediate latter height), then we cut at the mean height between these two, and the corresponding number of clusters is the optimal one; otherwise, we do not cut.

Value

a list containing the optimal hierarchical clustering results, the optimal number of clusters, the corresponding maximum Silhouette index and other indices.

Examples

1
hres = get_opt_hclust(mat)

shibiaowan/SHARP documentation built on April 28, 2021, 1:56 p.m.