Application on toy example

  collapse = TRUE,
  comment = "#>",
  fig.width = 6,
  fig.height = 5,
  fig.align = 'center'

dat.tsne <- singleCellHaystack::dat.tsne
dat.expression <- singleCellHaystack::dat.expression

Application on a toy dataset

A small toy dataset is included in the package. The toy dataset includes:

First, let's apply haystack (the main function of the package) on the toy dataset. This should take just several seconds on a typical desktop computer.


# Turn the expression data into detection (gene detected = TRUE, not detected = FALSE)
# let's define detection as having more than 1 read
dat.detection <- dat.expression > 1

# run the main 'haystack' analysis
# inputs are:
# 1) the coordinates of the cells in the input space (here: dat.tsne)
# 2) the detection data (dat.detection)
# 3) the method used ("2D" since we are using a 2D input space here)
res <- haystack(dat.tsne, detection = dat.detection, method = "2D")

# the returned results 'res' is of class 'haystack'

Let's have a look at the most significant differentially expressed genes (DEGs).

# show top 10 DEGs
show_result_haystack(res.haystack = res, n=10)

# alternatively: use a p-value threshold
#show_result_haystack(res.haystack = res, p.value.threshold = 1e-10)

One of the most significant DEGs is "gene_497". Here we visualize its expression in the t-SNE plot. As you can see, this DEG is expressed only in cells in the upper-left corner of the plot.

# visualize one of the surprizing genes
  expression = dat.expression,
  gene = "gene_497",
  detection = dat.detection,
  high.resolution = TRUE,
  point.size = 2

Yes, the coordinates of the cells in this toy example t-SNE space roughly resemble a haystack; see the Haystack paintings by Monet.

Clustering and visualization

You are not limited to single genes. Here, we pick up a set of DEGs, and group them by their expression pattern in the plot into 5 clusters.

# get the top most significant genes, and cluster them by their distribution pattern in the 2D plot
sorted.table <- show_result_haystack(res.haystack = res, p.value.threshold = 1e-10)
gene.subset <- row.names(sorted.table)

# k-means clustering
km <- kmeans_haystack(dat.tsne, detection=dat.detection, genes=gene.subset, k=5)
km.clusters <- km$cluster

# alternatively: hierarchical clustering
#hc <- hclust_haystack(dat.tsne, detection=dat.detection, genes=gene.subset)
#hc.clusters <- cutree(hc,k = 5)

... and visualize the average pattern of the genes in cluster 1 (for example).

# visualize cluster distributions
plot_gene_set_haystack(dat.tsne, detection = dat.detection, 
                        genes=names(km.clusters[km.clusters==1]), point.size=2)

From this plot we can see that genes in cluster 1 are mainly expressed in cells in the lower right part of the plot.

Try the singleCellHaystack package in your browser

Any scripts or data that you put into this service are public.

singleCellHaystack documentation built on March 28, 2021, 9:12 a.m.