library(knitr) opts_chunk$set( cache = FALSE, results = "hold" )
In this vignette, we will walk through an analysis of spatial transcriptomics (ST) data for a coronal section of the mouse brain. The mouse brain data for 2702 ST "spots" has been prepared for you and is available as a part of the package. Here, tissueSpotRotation
is a dataframe where each row is a probe spot's x and y positions in space, and filteredGenes
is a CPM normalized and log-transformed gene expression matrix with a filtered set of 1263 genes whose expression variance across the 2702 spots is higher than transcriptome-wide expectations.
library(MERINGUE) library(Matrix) data("mouseCoronal") filteredGenes <- mouseCoronal$filteredGenes tissueSpotRotation <- mouseCoronal$tissueSpotRotation
First, let's cluster the spots into transcriptionally distinct cell-types and subtypes. We will use spatially-unaware dimensionality reduction just based on gene expression without explicit consideration for spatial information. G=However, given the known spatial organization of cell-types in the brain, we anticipate that identified clusters should correspond to distinct cell layers or other spatially organized structures.
# Dimensionality reduction by PCA on log10 CPM normalized expression values pcs.info <- prcomp(t(log10(as.matrix(filteredGenes)+1)), center=TRUE) # Check `screeplot` to assess number of PCs to use screeplot(pcs.info, npcs=20,) # Choose number of PCs to look at based on screeplot ("elbow rule") numberPcs <- 5 # pcs <- pcs.info$x[,1:numberPcs] # 2D embedding by tSNE emb <- Rtsne::Rtsne(pcs, is_distance=FALSE, perplexity=30, num_threads=1, verbose=FALSE)$Y rownames(emb) <- rownames(pcs) # iGraph cluster community labeling spotClusters <- getClusters(pcs, k = 100)
This approach identified 8 transcriptionally distinct clusters of cells. Indeed, each of the 8 clusters appear to correspond to a particular cell layer or region in the mouse coronal brain slice.
par(mfrow=c(1,2), mar=rep(2,4)) # plot the tsne and color by community plotEmbedding(emb, groups=spotClusters, show.legend=TRUE, xlab='tSNE X', ylab='tSNE Y', verbose=FALSE) # plot the spots based on position on tissue, color by community plotEmbedding(tissueSpotRotation, groups=spotClusters, cex=1, xlab='spatial X', ylab='spatial Y', verbose=FALSE)
Interestingly, cluster 4 traces the hippocampus proper CA1, CA2, and CA3 feilds quite well, but there are other spots of this cluster present in potentially the olfactory areas of the cerebral cortex. It is possible that this cluster corresponds to pyramidal cells. To further interpret each cluster 4, let's identify some marker genes that are significantly enriched.
# Identify significantly differentially upregulated genes # in each identified cluster by Wilcox test diffGenes <- getDifferentialGenes(as.matrix(filteredGenes), spotClusters) ## focus on cluster 4 diffExpCluster4Genes <- diffGenes[[4]] # gene with highest expression in cluster assigned as marker to that cluster highestExpClust4 <- diffExpCluster4Genes[which(diffExpCluster4Genes$highest == TRUE),] ## order and view topDiffExpGenesClust4 <- highestExpClust4[order(highestExpClust4$p.adj),] topDiffExpGenesClust4[1:10,]
Looking at ISH data, several of these genes with coronal brain slice ISH data trace the hippocampus proper pyrimidal cell layer well and have expression in the olifactory region layer that was included in cluster 4. The top hit, Cpne6, has been reported to be expressed in pyramidal cells of the CA1-3 regions of the hippocampus.
par(mfrow=c(1,2), mar=rep(2,4)) g <- 'Cpne6' gexp <- scale(filteredGenes[g,])[,1] plotEmbedding(emb, col=gexp) plotEmbedding(tissueSpotRotation, col=gexp, cex=1, xlab='spatial X', ylab='spatial Y')
So transcriptionally distinct cluster 4 appears to have identified pyramidal neurons of the hippocampus proper and potentially a pyramidal layer in the olfactory region. The hippocampus proper is a highly specific and striking structure of the brain. During development, one hypothesis is that gene expression gradients are responsible for the organization of cell types into their respective positions and thus drive the formation of tissues and cellular structures. We can further analyze these putative pyramidal cells to identify such potential additional spatial gene expression gradients.
We will first build an adjacency matrix of spots in cluster 4.
# get spots that are part of the cluster cluster4SpotIDs <- names(spotClusters[which(spotClusters == '4')]) cluster4SpotCoords <- tissueSpotRotation[cluster4SpotIDs,] # build weight matrix for the cluster spots cluster4WeightMatrix <- getSpatialNeighbors( cluster4SpotCoords, filterDist = 25, verbose=TRUE) plotNetwork(cluster4SpotCoords, cluster4WeightMatrix)
We can then identify significantly spatially variable genes.
# get gene expression of filteredGenes for just the cluster spots cluster4GeneExp <- filteredGenes[,which(colnames(filteredGenes) %in% cluster4SpotIDs)] # calculate Moran's I for filteredGenes only using the spots in cluster # aka autocorrelation of filtered genes within the cluster cluster4GeneExpMoransI <- getSpatialPatterns(cluster4GeneExp, cluster4WeightMatrix, verbose=TRUE) # find significantly spatially variable genes cluster4SpatialGenes <- filterSpatialPatterns( cluster4GeneExp, cluster4GeneExpMoransI, cluster4WeightMatrix, details = TRUE, verbose=TRUE, minPercentCells = 0.10) # plot a few genes par(mfrow=c(2,2)) sapply(rownames(cluster4SpatialGenes)[1:4], function(g) { plotEmbedding(emb = cluster4SpotCoords, col = winsorize(scale(cluster4GeneExp[g,])[,1]), main=g) })
Indeed, we are able to identify a few hundred significantly spatially variable genes. Let's try to organize them into spatial patterns.
# spatial cross cor matrix cluster4CorrMtx <- spatialCrossCorMatrix( cluster4GeneExp[rownames(cluster4SpatialGenes),], cluster4WeightMatrix) par(mfrow=c(2,2)) # use all tissue spots to make the visualization more representative method = 'ward.D' cluster4SpatialPatterns <- groupSigSpatialPatterns( pos=tissueSpotRotation, mat=cluster4GeneExp[rownames(cluster4SpatialGenes),], scc=cluster4CorrMtx, hclustMethod = method, deepSplit = 1, binSize = 50, power = 1) # Double check cross correlation matrix par(mfrow=c(1,1)) # Look at pattern association by plotting SCI matrix as a heatmap and dendrogram patternColors <- rainbow( length(levels(cluster4SpatialPatterns$groups)), v=0.5)[cluster4SpatialPatterns$groups] names(patternColors) <- names(cluster4SpatialPatterns$groups) # Visualize as heatmap heatmap(cluster4CorrMtx[cluster4SpatialPatterns$hc$labels, cluster4SpatialPatterns$hc$labels], scale='none', Colv=as.dendrogram(cluster4SpatialPatterns$hc), Rowv=as.dendrogram(cluster4SpatialPatterns$hc), labCol=NA, labRow = NA, RowSideColors=patternColors[cluster4SpatialPatterns$hc$labels], ColSideColors=patternColors[cluster4SpatialPatterns$hc$labels], col=colorRampPalette(c('white', 'lightgrey', 'black'))(100) )
Interestingly, the spatial patterns appear to distinguish the putative pyramidal neuron layer in the olfactory area (spatial pattern 1), and also appear to demarcate the CA1 (spatial pattern 3), CA2 (spatial pattern 4), and CA3 (spatial pattern 2) regions of the hippocampus proper.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.