knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.align = 'center' ) library(ggplot2) theme_set(cowplot::theme_cowplot())
A small toy dataset is included in the package. The toy dataset includes:
dat.expression
: a toy scRNA-seq dataset with genes (rows) and cells (columns)
dat.tsne
: 2D coordinates of the cells in a t-SNE splot
First, let's apply haystack
(the main function of the package) on the toy dataset. This should take just several seconds on a typical desktop computer.
library(singleCellHaystack) set.seed(1234) # run the main 'haystack' analysis # inputs are: # 1) the coordinates of the cells in the input space (here: dat.tsne) # 2) the expression data (dat.expression) res <- haystack(dat.tsne, dat.expression) # the returned results 'res' is of class 'haystack' class(res)
Let's have a look at the most significant differentially expressed genes (DEGs).
# show top 10 DEGs show_result_haystack(res.haystack = res, n=10) # alternatively: use a p-value threshold #show_result_haystack(res.haystack = res, p.value.threshold = 1e-10)
One of the most significant DEGs is "gene_497". Here we visualize its expression in the t-SNE plot. As you can see, this DEG is expressed only in cells in the upper-left corner of the plot.
d <- cbind(dat.tsne, t(dat.expression)) d[1:4, 1:4]
library(ggplot2) ggplot(d, aes(tSNE1, tSNE2, color=gene_497)) + geom_point() + scale_color_distiller(palette="Spectral")
Yes, the coordinates of the cells in this toy example t-SNE space roughly resemble a haystack; see the Haystack paintings by Monet.
You are not limited to single genes. Here, we pick up a set of DEGs, and group them by their expression pattern in the plot into 5 clusters.
# get the top most significant genes, and cluster them by their distribution pattern in the 2D plot sorted.table <- show_result_haystack(res.haystack = res, p.value.threshold = 1e-10) gene.subset <- row.names(sorted.table) # k-means clustering #km <- kmeans_haystack(dat.tsne, dat.expression[gene.subset, ], grid.coordinates=res$info$grid.coordinates, k=5) #km.clusters <- km$cluster # alternatively: hierarchical clustering hm <- hclust_haystack(dat.tsne, dat.expression[gene.subset, ], grid.coordinates=res$info$grid.coordinates)
... and visualize the pattern of the selected genes.
ComplexHeatmap::Heatmap(dat.expression[gene.subset, ], show_column_names=FALSE, cluster_rows=hm, name="expression")
We divide the genes into clusters with cutree.
hm.clusters <- cutree(hm, k=4) table(hm.clusters)
Then calculate the average expression of the genes in each cluster.
for (cluster in unique(hm.clusters)) { d[[paste0("cluster_", cluster)]] <- colMeans(dat.expression[names(which(hm.clusters == cluster)), ]) }
lapply(c("cluster_1", "cluster_2", "cluster_3", "cluster_4"), function(cluster) { ggplot(d, aes(tSNE1, tSNE2, color=.data[[cluster]])) + geom_point() + scale_color_distiller(palette="Spectral") }) |> patchwork::wrap_plots()
From this plot we can see that genes in each cluster are expressed in different subsets of cells.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.