In Siliegia/SIGMA: A clusterability measure for scRNA-seq data

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

First you need to load the package and ggplot2 for the plots. We use splatter here to construct a toy data set.

library(SIGMA)
library(splatter)
library(ggplot2)

Here, we import splatter data from the package, called "splatO". And undergo the necessary processings steps for the measure.

#Load sample data simulated with splatter
data("splatO")

expr <- counts(splatO)
expr <- expr[rowSums(expr)>0,]

#Normalize and log-transform the data
expr.norm <- t(t(expr)/colSums(expr))*10000
expr.norm.log <- log(expr.norm + 1)

#Create toy example of a data set
test.cluster <- as.character(splatO$Group)
test.cluster[test.cluster == "Group3"] <- "Group2"
test.cluster[test.cluster == "Group4"] <- "Group2"

Then, we are ready to run the main function for clusterability:

#Main funcion that calculates the clusterability
out <- sigma_funct(expr = expr.norm.log, clusters = test.cluster, 
                   exclude = data.frame(clsm = log(colSums(expr) + 1)))

We can have a look at the main output of this function. For each cluster, the corresponding clusterability measure is shown.

#Evaluate the output of the measure

#plot all values for sigma
plot_sigma(out)

If you would like to go into more detail, then you can have a look at all sigmas and g-sigmas that are available per cluster.

#Plot all values for sigma and g_sigma
plot_all_sigmas(out)
plot_all_g_sigmas(out)

If you are interested in the values of all sigmas, g-sigmas and singular values of the signal matrix, then this information can be obtained with the help of this function.

#obtain the values for sigma and additional information
get_info(out, "Group2")

Now, to determine if the clustrs with a high clusterability measure have variances that are meaningful for you to sub-cluster, have a look at the variance driving genes, which will tell you which genes cause the signal to appear. For example, if genes are only related to differentiation, then sub-clustering might not be necessary but could be of interest.

#See which genes cause variances in the data
get_var_genes(out, "Group2")

You can also check out the fit of the MP distribution for each cluster.

#Check if the MP distribution fits to the data
plot_MP(out, "Group2")

And for further validation, see if the singular vectors of the significant singular values look meaningful. By plotting either clusters or genes with the singular vectors.

#Plot clusters
plot_singular_vectors(out, "Group2", colour = splatO$Group[test.cluster == "Group2"])

#Plot variance driving genes
plot_singular_vectors(out, "Group2", colour = "Gene401")