In yycunc/SAFEclustering: SAFE-clustering:Single-cell Aggregated (From Ensemble) Clustering for Single-cell RNA-seq Data

library(knitr)
opts_chunk$set(echo = TRUE)

Brief introduction

In this tutorial, we will analyze two datasets: one from Zheng et al., (Nature Communications, 2016). Zheng dataset contains 500 human peripheral blood mononuclear cells (PBMCs) sequenced using GemCode platform, which consists of three cell types, CD56+ natural killer cells, CD19+ B cells and CD4+/CD25+ regulatory T cells. The original data can be downloaded from 10X GENOMICS website.

Setup the library

library("SAFEclustering")
data("data_SAFE")

Zheng dataset

Setup the input expression matrix

dim(data_SAFE$Zheng.expr)

data_SAFE$Zheng.expr[1:5, 1:5]

Perform individual clustering

Here we perform single-cell clustering using four popular methods, SC3, CIDR, Seurat and t-SNE + k-means, without filtering any genes or cells.

cluster.results <- individual_clustering(inputTags = data_SAFE$Zheng.expr, mt_filter = TRUE, SC3 = TRUE, gene_filter = FALSE, CIDR = TRUE, nPC.cidr = NULL, Seurat = TRUE, nGene_filter = FALSE, nPC.seurat = NULL, resolution = 0.7, tSNE = TRUE, dimensions = 3, perplexity = 30, SEED = 123)

The function indiviual_clustering will output a matrix, where each row represents the cluster results of each method, and each colunm represents a cell. User can also extend SAFE-clustering to other scRNA-seq clustering methods, by putting all clustering results into a $M * N$ matrix with M clustering methods and N cells.

cluster.results[1:4, 1:10]

Cluster ensemble

Using the clustering results generated in last step, we perform cluster ensemble using three partitioning algorithms meta-clustering algorithm (MCLA), hypergraph partitioning algorithm (HGPA) and cluster-based similarity partitioning algorithm (CSPA) (Strehl and Ghosh, Proceedings of AAAI 2002, Edmonto, Canada, 2002).

Note that HGPA is performed using the shmetis program (from the hMETIS package v. 1.5 (Karypis et al., IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 1999)), and MCLA and CSPA are performed using gpmetis program (from METIS v. 5.1.0 (Karypis and Kumar, SIAM Journal on Scientific Computing, 1998)). Please put them in the working directory or provide the directory where these two programs are.

cluster.ensemble <- SAFE(cluster_results = cluster.results, program.dir = "~/Documents/single_cell_clustering", MCLA = TRUE, CSPA = TRUE, HGPA = TRUE, SEED = 123)

Here is the list of ANMI results for esemble solution of each K and each partitioning algorithm.

## [1] "HGPA partitioning at K = 2: 2 clusters at ANMI = 0.00329903476904425"
## [1] "HGPA partitioning at K = 3: 3 clusters at ANMI = 0.278691668779803"
## [1] "HGPA partitioning at K = 4: 4 clusters at ANMI = 0.00392992505505839"
## [1] "HGPA partitioning at K = 5: 5 clusters at ANMI = 0.552234460801785"
## [1] "MCLA partitioning at K = 2: 2 clusters at ANMI = 0.568294023177534"
## [1] "MCLA partitioning at K = 3: 3 clusters at ANMI = 0.929094923585274"
## [1] "MCLA partitioning at K = 4: 4 clusters at ANMI = 0.872601957447147"
## [1] "MCLA partitioning at K = 5: 4 clusters at ANMI = 0.923346490477427"
## [1] "CSPA partitioning at K = 2: 2 clusters at ANMI = 0.53144399728197"
## [1] "CSPA partitioning at K = 3: 3 clusters at ANMI = 0.850151780486274"
## [1] "CSPA partitioning at K = 4: 4 clusters at ANMI = 0.665510270422344"
## [1] "CSPA partitioning at K = 5: 5 clusters at ANMI = 0.666022118059772"
## [1] "Optimal number of clusters is 3 with ANMI = 0.929094923585274"

Function SAFE will output a list for Average Normalized Mutual Information (ANMI) metric (Strehl and Ghosh Proceedings of AAAI 2002, Edmonto, Canada, 2002) between each ensemble solution and the individual solutions. The optimal clustering ensemble is selected from the ensemble solution with the highest ANMI value.

cluster.ensemble$Summary

cluster.ensemble$MCLA[1:10]

cluster.ensemble$MCLA_optimal_k

We can compare the clustering results to the true labels using the Adjusted Rand Index (ARI)

library(cidr)

# Cell labels of ground truth
head(data_SAFE$Zheng.celltype)

# Calculating ARI for cluster ensemble
adjustedRandIndex(cluster.ensemble$optimal_clustering, data_SAFE$Zheng.celltype)

yycunc/SAFEclustering documentation built on March 29, 2021, 5:58 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

yycunc/SAFEclustering
SAFE-clustering:Single-cell Aggregated (From Ensemble) Clustering for Single-cell RNA-seq Data

In yycunc/SAFEclustering: SAFE-clustering:Single-cell Aggregated (From Ensemble) Clustering for Single-cell RNA-seq Data

Brief introduction

Setup the library

Zheng dataset

Setup the input expression matrix

Perform individual clustering

Cluster ensemble

R Package Documentation

Browse R Packages

We want your feedback!

yycunc/SAFEclustering SAFE-clustering:Single-cell Aggregated (From Ensemble) Clustering for Single-cell RNA-seq Data

In yycunc/SAFEclustering: SAFE-clustering:Single-cell Aggregated (From Ensemble) Clustering for Single-cell RNA-seq Data

Brief introduction

Setup the library

Zheng dataset

Setup the input expression matrix

Perform individual clustering

Cluster ensemble

R Package Documentation

Browse R Packages

We want your feedback!

yycunc/SAFEclustering
SAFE-clustering:Single-cell Aggregated (From Ensemble) Clustering for Single-cell RNA-seq Data