Cluster similar cells based on rank correlations in their gene expression profiles.
1 2 3 4
A numeric count matrix where rows are genes and columns are cells. Alternatively, a SCESet object containing such a matrix.
An integer scalar specifying the minimum size of each cluster.
A logical, integer or character scalar indicating the rows of
A string specifying which assay values to use, e.g.,
A logical specifying whether spike-in transcripts should be used.
This function provides a correlation-based approach to quickly define clusters of a minimum size
A distance matrix is constructed using Spearman's correlation on the counts between cells.
(Some manipulation is performed to convert the correlation into a proper distance metric.)
Hierarchical clustering is performed and a dynamic tree cut is used to define clusters of cells.
A correlation-based approach is preferred here as it is invariant to scaling normalization.
This avoids circularity between normalization and clustering.
Note that some cells may not be assigned to any cluster.
In most cases, this is because those cells belong in a separate cluster with fewer than
The function will not be able to call this as a cluster as the minimum threshold on the number of cells has not been passed.
Users are advised to check that the unassigned cells do indeed form their own cluster.
If so, it is generally safe to ignore this warning and to treat all unassigned cells as a single cluster.
Otherwise, it may be necessary to use a custom clustering algorithm.
quickCluster,SCESet-method, spike-in transcripts are not used by default as they provide little information on the biological similarities between cells.
This may not be the case if subpopulations differ by total RNA content, in which case setting
get.spikes=TRUE may provide more discriminative power.
Users can also set
subset.row to specify which rows of
x are to be used to calculate correlations.
This is equivalent to but more efficient than subsetting
x directly, as it avoids constructing a (potentially large) temporary matrix.
Note that if
subset.row is specified, it will overwrite any setting of
A vector of cluster identities for each cell in
"0" are used to indicate cells that are not assigned to any cluster.
Aaron Lun and Karsten Bach
van Dongen S and Enright AJ (2012). Metric distances derived from cosine similarity and Pearson and Spearman correlations. arXiv 1208.3145
Lun ATL, Bach K and Marioni JC (2016). Pooling across cells to normalize single-cell RNA sequencing data with many zero counts. Genome Biol. 17:75
1 2 3 4 5 6