knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
First you need to load the package and ggplot2 for the plots. We use splatter here to construct a toy data set.
library(SIGMA) library(splatter) library(ggplot2)
Here, we import splatter data from the package, called "splatO". And undergo the necessary processings steps for the measure.
#Load sample data simulated with splatter data("splatO") expr <- counts(splatO) expr <- expr[rowSums(expr)>0,] #Normalize and log-transform the data expr.norm <- t(t(expr)/colSums(expr))*10000 expr.norm.log <- log(expr.norm + 1) #Create toy example of a data set test.cluster <- as.character(splatO$Group) test.cluster[test.cluster == "Group3"] <- "Group2" test.cluster[test.cluster == "Group4"] <- "Group2"
Then, we are ready to run the main function for clusterability:
#Main funcion that calculates the clusterability out <- sigma_funct(expr = expr.norm.log, clusters = test.cluster, exclude = data.frame(clsm = log(colSums(expr) + 1)))
We can have a look at the main output of this function. For each cluster, the corresponding clusterability measure is shown.
#Evaluate the output of the measure #plot all values for sigma plot_sigma(out)
If you would like to go into more detail, then you can have a look at all sigmas and g-sigmas that are available per cluster.
#Plot all values for sigma and g_sigma plot_all_sigmas(out) plot_all_g_sigmas(out)
If you are interested in the values of all sigmas, g-sigmas and singular values of the signal matrix, then this information can be obtained with the help of this function.
#obtain the values for sigma and additional information get_info(out, "Group2")
Now, to determine if the clustrs with a high clusterability measure have variances that are meaningful for you to sub-cluster, have a look at the variance driving genes, which will tell you which genes cause the signal to appear. For example, if genes are only related to differentiation, then sub-clustering might not be necessary but could be of interest.
#See which genes cause variances in the data get_var_genes(out, "Group2")
You can also check out the fit of the MP distribution for each cluster.
#Check if the MP distribution fits to the data plot_MP(out, "Group2")
And for further validation, see if the singular vectors of the significant singular values look meaningful. By plotting either clusters or genes with the singular vectors.
#Plot clusters plot_singular_vectors(out, "Group2", colour = splatO$Group[test.cluster == "Group2"]) #Plot variance driving genes plot_singular_vectors(out, "Group2", colour = "Gene401")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.