findClusters: Find Clusters

View source: R/findClusters.R

findClustersR Documentation

Find Clusters

Description

Search for clusters in the scCNA data.

Usage

findClusters(
  scCNA,
  embedding = "umap",
  ncomponents = 2,
  method = c("hdbscan", "leiden", "louvain"),
  k_superclones = NULL,
  k_subclones = NULL,
  seed = 17
)

Arguments

scCNA

scCNA object.

embedding

String with the name of the reducedDim to pull data from.

ncomponents

An integer with the number of components dimensions to use from the embedding.

method

A string with method used for clustering.

k_superclones

A numeric k-nearest-neighbor value. Used to find the superclones.

k_subclones

A numeric k-nearest-neighbor value. Used to find the subclones

seed

A numeric passed on to pseudo-random dependent functions.

Details

findClusters uses the reduced dimensional embedding resulting from runUmap to perform clustering at two levels, hereby referred to as superclones, and subclones. When clustering for superclones findClusters creates a graph representation of the dataset reduced dimension embedding using a shared nearest neighbor algorithm (SNN) makeSNNGraph, from this graph the connected components are extracted and generally represent high-level structures that share large, lineage defining copy number events. At a more fine-grained resolution, CopyKit can also be used to detect subclones, i. e. groups of cells containing a unique copy number event per cluster, to do so the umap embedding is again used as the pre-processing step, this time to perform a density-based clustering with hdbscan hdbscan. Network clustering algorithms on top of the SNN graph such as the leiden algorithm leiden_find_partition.

  • hdbscan: hdbscan is an outlier aware clustering algorithm, since extensive filtering of the dataset can be applied before clustering with findOutliers, any cell classified as an outlier is inferred to the same cluster group as its closest, non-outlier, nearest-neighbor according to Euclidean distance.

Value

Cluster information is added to colData in columns superclones or subclones. Superclones are prefixed by 's' whereas subclones are prefixed by 'c'.

Author(s)

Darlan Conterno Minussi

References

Laks, E., McPherson, A., Zahn, H., et al. (2019). Clonal Decomposition and DNA Replication States Defined by Scaled Single-Cell Genome Sequencing. Cell, 179(5), 1207–1221.e22. https://doi.org/10.1016/j.cell.2019.10.026

Leland McInnes and John Healy and James Melville. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv:1802.03426

Lun ATL, McCarthy DJ, Marioni JC (2016). “A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor.” F1000Res., 5, 2122. doi: 10.12688/f1000research.9501.2.

See Also

findSuggestedK.

hdbscan For hdbscan clustering.

Examples

copykit_obj <- copykit_example_filtered()
copykit_obj <- findClusters(copykit_obj)

navinlabcode/copykit documentation built on Oct. 16, 2024, 2:55 p.m.