clusterCells: Cluster cells into a specified number of groups based on .

Description Usage Arguments Value References

View source: R/clustering.R

Description

Unsupervised clustering of cells is a common step in many single-cell expression workflows. In an experiment containing a mixture of cell types, each cluster might correspond to a different cell type. This method takes a CellDataSet as input along with a requested number of clusters, clusters them with an unsupervised algorithm (by default, density peak clustering), and then returns the CellDataSet with the cluster assignments stored in the pData table. When number of clusters is set to NULL (num_clusters = NULL), the decision plot as introduced in the reference will be plotted and the users are required to check the decision plot to select the rho and delta to determine the number of clusters to cluster. When the dataset is big, for example > 50 k, we recommend the user to use the Louvain clustering algorithm which is inspired from phenograph paper. Note Louvain doesn't support the num_cluster argument but the k (number of k-nearest neighbors) is relevant to the final clustering number. The implementation of Louvain clustering is based on the Rphenograph package but updated based on our requirement (for example, changed the jaccard_coeff function as well as adding louvain_iter argument, etc.)

Usage

1
2
3
4
5
6
clusterCells(cds, skip_rho_sigma = F, num_clusters = NULL,
  inspect_rho_sigma = F, rho_threshold = NULL, delta_threshold = NULL,
  peaks = NULL, gaussian = T, cell_type_hierarchy = NULL,
  frequency_thresh = NULL, enrichment_thresh = NULL,
  clustering_genes = NULL, k = 50, louvain_iter = 1, weight = FALSE,
  method = c("densityPeak", "louvain", "DDRTree"), verbose = F, ...)

Arguments

cds

the CellDataSet upon which to perform this operation

skip_rho_sigma

A logic flag to determine whether or not you want to skip the calculation of rho / sigma

num_clusters

Number of clusters. The algorithm use 0.5 of the rho as the threshold of rho and the delta corresponding to the number_clusters sample with the highest delta as the density peaks and for assigning clusters

inspect_rho_sigma

A logical flag to determine whether or not you want to interactively select the rho and sigma for assigning up clusters

rho_threshold

The threshold of local density (rho) used to select the density peaks

delta_threshold

The threshold of local distance (delta) used to select the density peaks

peaks

A numeric vector indicates the index of density peaks used for clustering. This vector should be retrieved from the decision plot with caution. No checking involved. will automatically calculated based on the top num_cluster product of rho and sigma.

gaussian

A logic flag passed to densityClust function in desnityClust package to determine whether or not Gaussian kernel will be used for calculating the local density

cell_type_hierarchy

A data structure used for organizing functions that can be used for organizing cells

frequency_thresh

When a CellTypeHierarchy is provided, cluster cells will impute cell types in clusters that are composed of at least this much of exactly one cell type.

enrichment_thresh

fraction to be multipled by each cell type percentage. Only used if frequency_thresh is NULL, both cannot be NULL

clustering_genes

a vector of feature ids (from the CellDataSet's featureData) used for ordering cells

k

number of kNN used in creating the k nearest neighbor graph for Louvain clustering. The number of kNN is related to the resolution of the clustering result, bigger number of kNN gives low resolution and vice versa. Default to be 50

louvain_iter

number of iterations used for Louvain clustering. The clustering result gives the largest modularity score will be used as the final clustering result. Default to be 1.

weight

A logic argument to determine whether or not we will use Jaccard coefficent for two nearest neighbors (based on the overlapping of their kNN) as the weight used for Louvain clustering. Default to be FALSE.

method

method for clustering cells. Three methods are available, including densityPeak, louvian and DDRTree. By default, we use density peak clustering algorithm for clustering. For big datasets (like data with 50 k cells or so), we recommend using the louvain clustering algorithm.

verbose

Verbose A logic flag to determine whether or not we should print the running details.

...

Additional arguments passed to densityClust()

Value

an updated CellDataSet object, in which phenoData contains values for Cluster for each cell

References

Rodriguez, A., & Laio, A. (2014). Clustering by fast search and find of density peaks. Science, 344(6191), 1492-1496. doi:10.1126/science.1242072

Vincent D. Blondel, Jean-Loup Guillaume, Renaud Lambiotte, Etienne Lefebvre: Fast unfolding of communities in large networks. J. Stat. Mech. (2008) P10008

Jacob H. Levine and et.al. Data-Driven Phenotypic Dissection of AML Reveals Progenitor-like Cells that Correlate with Prognosis. Cell, 2015.


cole-trapnell-lab/monocle-release documentation built on May 13, 2019, 8:50 p.m.