Application on toy example"

  collapse = TRUE,
  comment = "#>",
  fig.align = 'center'


Application on a toy dataset

A small toy dataset is included in the package. The toy dataset includes:

First, let's apply haystack (the main function of the package) on the toy dataset. This should take just several seconds on a typical desktop computer.


# run the main 'haystack' analysis
# inputs are:
# 1) the coordinates of the cells in the input space (here: dat.tsne)
# 2) the expression data (dat.expression)
res <- haystack(dat.tsne, dat.expression)

# the returned results 'res' is of class 'haystack'

Let's have a look at the most significant differentially expressed genes (DEGs).

# show top 10 DEGs
show_result_haystack(res.haystack = res, n=10)

# alternatively: use a p-value threshold
#show_result_haystack(res.haystack = res, p.value.threshold = 1e-10)

One of the most significant DEGs is "gene_497". Here we visualize its expression in the t-SNE plot. As you can see, this DEG is expressed only in cells in the upper-left corner of the plot.

d <- cbind(dat.tsne, t(dat.expression))
d[1:4, 1:4]

ggplot(d, aes(tSNE1, tSNE2, color=gene_497)) +
  geom_point() +

Yes, the coordinates of the cells in this toy example t-SNE space roughly resemble a haystack; see the Haystack paintings by Monet.

Clustering and visualization

You are not limited to single genes. Here, we pick up a set of DEGs, and group them by their expression pattern in the plot into 5 clusters.

# get the top most significant genes, and cluster them by their distribution pattern in the 2D plot
sorted.table <- show_result_haystack(res.haystack = res, p.value.threshold = 1e-10)
gene.subset <- row.names(sorted.table)

# k-means clustering
#km <- kmeans_haystack(dat.tsne, dat.expression[gene.subset, ], grid.coordinates=res$info$grid.coordinates, k=5)
#km.clusters <- km$cluster

# alternatively: hierarchical clustering
hm <- hclust_haystack(dat.tsne, dat.expression[gene.subset, ], grid.coordinates=res$info$grid.coordinates)

... and visualize the pattern of the selected genes.

ComplexHeatmap::Heatmap(dat.expression[gene.subset, ], show_column_names=FALSE, cluster_rows=hm, name="expression")

We divide the genes into clusters with cutree.

hm.clusters <- cutree(hm, k=4)

Then calculate the average expression of the genes in each cluster.

for (cluster in unique(hm.clusters)) {
  d[[paste0("cluster_", cluster)]] <- colMeans(dat.expression[names(which(hm.clusters == cluster)), ])
lapply(c("cluster_1", "cluster_2", "cluster_3", "cluster_4"), function(cluster) {
  ggplot(d, aes(tSNE1, tSNE2,[[cluster]])) +
  geom_point() +
}) |> patchwork::wrap_plots()

From this plot we can see that genes in each cluster are expressed in different subsets of cells.

Try the singleCellHaystack package in your browser

Any scripts or data that you put into this service are public.

singleCellHaystack documentation built on Dec. 28, 2022, 1:29 a.m.