library(knitr)
opts_chunk$set(echo = TRUE)

Brief introduction

In this tutorial, we will analyze two datasets: one from Zheng et al., (Nature Communications, 2016) and the other from Biase et al., (Genome Research, 2014). Zheng dataset contains 500 human peripheral blood mononuclear cells (PBMCs) sequenced using GemCode platform, which consists of three cell types, CD56+ natural killer cells, CD19+ B cells and CD4+/CD25+ regulatory T cells. The original data can be downloaded from 10X GENOMICS website. The Biase dataset has 49 mouse embryo cells, which were sequenced by SMART-Seq and can be found at NCBI GEO:GSE57249.


Setup the library

library("SAMEclustering")
data("data_SAME")

Zheng dataset

Setup the input expression matrix

dim(data_SAME$Zheng.expr)

data_SAME$Zheng.expr[1:5, 1:5]

Perform individual clustering

Here we perform single-cell clustering using five popular methods, SC3, CIDR, Seurat, t-SNE + k-means and SIMLR. Genes expressed in less than 10% or more than 90% of cells are removed for CIDR, tSNE + k-means and SIMLR clustering. To improve the performance of cluster ensemble, we choose a maximally diverse set of four individual cluster solutions according to variation in pairwise Adjusted Rand Index (ARI).

cluster.result <- individual_clustering(inputTags = data_SAME$Zheng.expr, datatype = "count", percent_dropout = 10, SC3 = TRUE, CIDR = TRUE, nPC.cidr = NULL, Seurat = TRUE, nPC.seurat = NULL, resolution = 0.9, tSNE = TRUE, dimensions = 2, perplexity = 30, SIMLR = TRUE, diverse = TRUE, SEED = 123)

The function indiviual_clustering will output a matrix, where each row represents the cluster results of each method, and each colunm represents a cell. User can also extend SAME-clustering to other scRNA-seq clustering methods, by putting all clustering results into a $M$ by $N$ matrix with $M$ clustering methods and $N$ single cells.

cluster.result[1:4, 1:10]

Cluster ensemble

Using the individual clustering results generated in last step, we perform cluster ensemble using EM algorithm.

cluster.ensemble <- SAMEclustering(Y = t(cluster.result), rep = 3, SEED = 123)

Function SAMEclustering will output a list with optimal clusters and cluster number based on AIC and BIC index, respectively.

cluster.ensemble

We can compare the clustering results to the true labels using the ARI. In our implementation, we use the clusters produced using the BIC criterion as our ensemble solution.

library(cidr)

# Cell labels of ground truth
head(data_SAME$Zheng.celltype)

# Calculating ARI for cluster ensemble
adjustedRandIndex(cluster.ensemble$BICcluster, data_SAME$Zheng.celltype)

Biase dataset

Setup the input expression matrix

dim(data_SAME$Biase.expr.expr)

data_SAME$Biase.expr[1:5, 1:5]

Perform individual clustering

Here we perform single-cell clustering using five popular methods, SC3, CIDR, Seurat, t-SNE + k-means and SIMLR. Genes expressed in less than 10% or more than 90% of cells are removed for CIDR, tSNE + k-means and SIMLR clustering. Since there are only 49 cells in Biase dataset, the resolution parameter is set to 1.2. To improve the performance of cluster ensemble, we choose a maximally diverse set of four individual cluster solutions according to variation in pairwise ARI.

cluster.result <- individual_clustering(inputTags = data_SAME$Biase.expr, datatype = "FPKM",  percent_dropout = 10, SC3 = TRUE, CIDR = TRUE, nPC.cidr = NULL, Seurat = TRUE, nPC.seurat = NULL, seurat_min_cell = 200, resolution_min = 1.2, tSNE = TRUE, dimensions = 2, tsne_min_cells = 200, tsne_min_perplexity = 10, SIMLR = TRUE, diverse = TRUE, SEED = 123)

Cluster ensemble

Using the clustering results, we perform cluster ensemble using EM algorithm.

cluster.ensemble <- SAMEclustering(Y = t(cluster.result), rep = 3, SEED = 123)
cluster.ensemble

Compare the cluster ensemble results to the true labels.

# Cell labels of ground truth
head(data_SAME$Biase.celltype)

# Calculating ARI for cluster ensemble
adjustedRandIndex(cluster.ensemble$BICcluster, data_SAME$Biase.celltype)



yycunc/SAMEclustering documentation built on May 6, 2021, 6:05 p.m.