Clustering of individual cells based on a metric of choice

Share:

Description

This function calculates a disconnected graph where the connected components are the groups generated by the selected clustering method.

In order to obtain a vector showing each cell corresponding cluster, the easiest way is by using the 'clusters()' function from igraph. For more information, check the examples below or the help page of 'clusters()', i.e. 'help(clusters)'.

Usage

1
2
sc_clusterObj(SincellObject, clust.method="knn", mutual=TRUE, k=3, 
  max.distance=0, shortest.rank.percent=10)

Arguments

SincellObject

A SincellObject named list as created by function sc_distanceObj() or sc_DimensionalityReductionObj(), containing in member "cell2celldist" a distance matrix representing a cell-to-cell distance matrix assessed on a gene expression matrix with a metric of choice

clust.method

If clust.method="max.distance", clusters are defined as subgraphs generated by a maximum pair-wise distance cut-off, that is: from a totally connected graph where all cells are connected to each other, the algorithm only keeps pairs of cells connected by a distance lower than a given threshold.

If clust.method="percent", clusters are defined as subgraphs generated by a given rank-percentile of the shortest pair-wise distances, that is; from a totally connected graph where all cells are connected to each other, the algorithm only keeps the top “x” percent of shortest pairwise distances as indicated by "shortest.rank.percent".

If clust.method="knn", unsupervised K-Nearest Neighbors (K-NN) clustering is performed: From a totally disconnected graph where none of the cells are connected to each other, the algorithm connects each cell to its “k” nearest neighbors. If parameter "mutual=TRUE", Unsupervised K-Mutual Nearest Neighbours (K-MNN) clustering is performed, that is: only reciprocal k nearest neighbors are connected.

If clust.method="k-medoids", clustering around medoids (a more robust version of k-means) is performed with function "pam" from package "cluster" on the distance matrix in mySincellObject[["cell2celldist"]] with a desired number of groups indicated in parameter "num.clusters"

Hierarchical agglomerative clustering can be performed by internally calling function "hclust" where the agglomeration method is indicated in parameter "clust.method" as one of "ward.D", "ward.D2", "single", "complete", "average" (= UPGMA), "mcquitty" (= WPGMA), "median" (= WPGMC) or "centroid" (= UPGMC). Clusters are obtained by cutting the tree produced by hclust with function cutree with a desired number of groups indicated in parameter "num.clusters"

mutual

If clust.method="knn" and "mutual=TRUE", Unsupervised K-Mutual Nearest Neighbours (K-MNN) clustering is performed, that is: only reciprocal k nearest neighbors are connected.

k

If clust.method="knn", k is an integer specifying the number of nearest neighbors to consider in K-NN and K-KNN

max.distance

in max.distance algorithm, select up to which distance the points will be linked

shortest.rank.percent

in percent algorithm, select the percent of shortest distances will be represented as links

Value

The SincellObject named list provided as input where following list members are added: "cellsClustering"=cellsClustering,"clust.method"=clust.method,"mutual"=mutual, "k"=k,"max.distance"=max.distance,"shortest.rank.percent"=shortest.rank.percent, where "cellsClustering" contains an igraph graph object (see "igraph" R package documentation) representing the result of the clustering performed with the indicated parameters.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
## Generate some random data
Data <- matrix(abs(rnorm(3000, sd=2)),ncol=10,nrow=300)

## Initializing SincellObject named list
mySincellObject <- sc_InitializingSincellObject(Data)

## Assessmet of cell-to-cell distance matrix without dimensionality reduction
mySincellObjectA <- sc_distanceObj(mySincellObject, method="spearman")

## Assessmet of cell-to-cell distance matrix after dimensionality reduction 
## with Principal Component Analysis (PCA) 
mySincellObjectB <- sc_DimensionalityReductionObj(mySincellObject, method="PCA",dim=2)

## Cluster
mySincellObjectA <- sc_clusterObj (mySincellObjectA, clust.method="max.distance", 
  max.distance=0.5)
mySincellObjectA <- sc_clusterObj(mySincellObjectA, clust.method="percent", 
  shortest.rank.percent=10)

## To access the igraph object representing the clustering output
cellsClusteringA<-mySincellObjectA[["cellsClustering"]]

## Check each cell its corresponding cluster
clusters(cellsClusteringA)

## Cluster
mySincellObjectB <- sc_clusterObj (mySincellObjectB, clust.method="knn", mutual=FALSE, k=3)
mySincellObjectB <- sc_clusterObj (mySincellObjectB, clust.method="knn", mutual=TRUE, k=3)

## To access the igraph object representing the clustering output
cellsClusteringB<-mySincellObjectB[["cellsClustering"]]

## Check each cell its corresponding cluster
clusters(cellsClusteringB)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.