clustREval is an R package for the evaluation of clustering results for scRNA-seq data. This package allows users to run multiple clustering pipelines on their scRNA-seq dataset, then compute differential expression-based and unsupervised metrics to evaluate performance of each pipeline.
To download clustREval, use the following commands:
require("devtools") devtools::install_github("cindyfang70/clustREval", build_vignettes = TRUE) library("clustREval")
Let's cluster the embryo scRNA-seq data using the two default pipelines (not evaluated for sake of time):
data(embryo) clustRes <- clustREval::runPipelineCombs(embryo)
The clustering results for the embryo data from the two default pipelines are included in the data directory, let's load it:
data(embryoClusts) head(embryoClusts)
Now, we have two clustering outputs on the same dataset. How do we know which one is better? Let's compute some unsupervised clustering metrics to see how each pipeline did.
clustMetrics1 <- computeUnsupervisedMetrics(embryo, embryoClusts[[1]]) clustMetrics2 <- computeUnsupervisedMetrics(embryo, embryoClusts[[2]]) clustMetrics1 clustMetrics2
We can see that the two pipeline performed very similarly. This is because the only difference between them is the clustering resolution, which are 0.1 and 0.2 respectively. A higher Dunn index and mean silhouette score indicate better clustering, so based on our unsupervised metrics, the 0.2 resolution pipeline is performing better.
Now, let's take a look at the biological validity of our clusters by performing some gene set enrichment analysis.
GMTPathway <- system.file("extdata", "h.all.v7.4.symbols.gmt", package = "clustREval") library(magrittr) gsea1 <- geneSetEval(embryo, embryoClusts[[1]], GMTPathway) gsea2 <- geneSetEval(embryo, embryoClusts[[2]], GMTPathway) plotGeneSetEval(gsea1[[1]])
Session Info:
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.