library(knitr) opts_chunk$set( cache = FALSE, results = "hold" )
In this vignette, we will walk through an analysis of spatial transcriptomics data for multiple consecutive tissue slices of a breast cancer biopsy. Briefly, for spatial transcriptomics, histological sections are placed on a grid of poly(dT) probe spots approximately 100 um in diameter, each with a unique DNA barcode. By resolving the DNA barcodes, spatial transcriptomics enables matching of detected mRNA abundances with their original spatially resolved spot, resulting in full transcriptome RNA-sequencing data with homogenously-spaced two-dimensional positional information at the pixel level. See the original publication for more information.
The data has been prepared for you and is available as a part of the package. Here, each dataset is a list containing two items: pos
is a dataframe where each row is a probe spot's x and y positions in space, and counts
is a counts matrix where each column is a probe spot and each row is a gene.
Although a total of 4 tissue sections are available, we will focus on analyzing 3 tissue sections for demonstrative purposes.
suppressMessages(library(MERINGUE)) data(BCL) # object contains counts and positions for all 4 sections. Will only need sections 1-3 head(BCL$pos) BCL$counts[1:5,1:5] ## parse out the 3 sections BCL1.pos <- BCL$pos[BCL$pos$slice==1,][,c("x", "y")] BCL2.pos <- BCL$pos[BCL$pos$slice==2,][,c("x", "y")] BCL3.pos <- BCL$pos[BCL$pos$slice==3,][,c("x", "y")] BCL1.counts <- BCL$counts[,BCL$pos$slice==1] BCL2.counts <- BCL$counts[,BCL$pos$slice==2] BCL3.counts <- BCL$counts[,BCL$pos$slice==3]
We will first combine the data from all three samples into a single counts matrix. We will also combine the position information into a list of positions.
# Get common set of genes for the sections genes.have <- Reduce(intersect, list( rownames(BCL1.counts), rownames(BCL2.counts), rownames(BCL3.counts) )) # Combine into large counts matrix. Counts of genes for all the sections being assessed counts <- cbind( BCL1.counts[genes.have,], BCL2.counts[genes.have,], BCL3.counts[genes.have,] ) # section factor section <- c( rep('L1', ncol(BCL1.counts)), rep('L2', ncol(BCL2.counts)), rep('L3', ncol(BCL3.counts)) ) names(section) <- colnames(counts) # List of positions posList <- list( BCL1.pos[colnames(BCL1.counts),], BCL2.pos[colnames(BCL2.counts),], BCL3.pos[colnames(BCL3.counts),] )
We will then filter out genes with fewer than 100 counts across all probe spots and filter out probe spots with fewer than 100 total counts and update our position information accordingly.
cc <- cleanCounts(counts, min.reads = 100, min.lib.size = 100, plot=TRUE) mat <- normalizeCounts(cc, log=FALSE, verbose=TRUE) posList[[1]] <- posList[[1]][intersect(rownames(posList[[1]]), colnames(mat)),] posList[[2]] <- posList[[2]][intersect(rownames(posList[[2]]), colnames(mat)),] posList[[3]] <- posList[[3]][intersect(rownames(posList[[3]]), colnames(mat)),]
Now, let's visualize the probe spots for each section. For demonstrative purposes, we will look for evidence of significant spatial aggregation for 1000 randomly chosen genes.
# Plot par(mfrow=c(1,3), mar=rep(2,4)) plotEmbedding(posList[[1]], groups=section, main='section 1', cex=2) plotEmbedding(posList[[2]], groups=section, main='section 2', cex=2) plotEmbedding(posList[[3]], groups=section, main='section 3', cex=2) # Sample 1000 genes for demonstrative purposes only set.seed(0) test <- sample(rownames(mat), 1000) mat <- mat[test,]
Now, we can analyze each section separately to look for significantly spatially aggregated genes. For each section, we will construct a binary weight matrix that represents the spatial adjacency relationship between probe spots. We will then look for significantly spatially aggregated genes driven by more than 5% of probe spots.
helper <- function(pos, mat) { w <- getSpatialNeighbors(pos, filterDist = 3) plotNetwork(pos, w) # get spatially clustered genes I <- getSpatialPatterns(mat, w) # filter for significant hits driven by more than 10% of cells results.filter <- filterSpatialPatterns(mat = mat, I = I, w = w, adjustPv = TRUE, alpha = 0.05, minPercentCells = 0.05, verbose = TRUE) # return results list(I=I, sig.genes=results.filter) } # Analyze each section using helper function par(mfrow=c(1,3), mar=rep(2,4)) L1 <- helper(posList[[1]], mat[, rownames(posList[[1]])]) L2 <- helper(posList[[2]], mat[, rownames(posList[[2]])]) L3 <- helper(posList[[3]], mat[, rownames(posList[[3]])])
We can also specifically look for spatially aggregated genes that with patterns that are consistent across sections. To do this, we will create a new binary weight matrix that specially connects probe spots across adjacent tissue sections. Two probe spots will have a weight of 1 if they are within the k mutual nearest neighbors in space across two sections. We can then use this new binary weight matrix to identify spatially aggregated genes that are aggregated across sections.
# K-mutual nearest neighbors across sections cw <- getCrossLayerNeighbors(posList, k=3) # Look for genes exhibiting spatial autocorrelation across layers I <- getSpatialPatterns(mat, cw) results.filter <- filterSpatialPatterns(mat = mat, I = I, w = cw, adjustPv = TRUE, alpha = 0.05, minPercentCells = 0.05/4, verbose = TRUE) cross <- list(I=I, sig.genes=results.filter)
We can now compare our results from our individual sample analyses and our cross sample analyses. Let's look at a few genes that are significantly aggregated within multiple sections AND consistent across sections.
gdups <- unlist(list('L1'=L1$sig.genes, 'L2'=L2$sig.genes, 'L3'=L3$sig.genes )) ## genes that are significantly spatially variable in multiple sections gdups <- gdups[duplicated(gdups)] ## genes that are significantly spatially variable across sections gcross <- cross$sig.genes ## both gall <- intersect(gcross, gdups) ## order by Moran's I gall <- gall[order(L1$I[gall,]$observed, decreasing=TRUE)] # Plot few consistent genes invisible(lapply(gall[1:2], function(g1) { par(mfrow=c(1,3), mar=rep(2,4)) plotEmbedding(posList[[1]], colors=winsorize(scale(mat[g1, rownames(posList[[1]])])[,1]), main=paste0(g1, '\n section 1'), cex=2, verbose=FALSE) plotEmbedding(posList[[2]], colors=winsorize(scale(mat[g1, rownames(posList[[2]])])[,1]), main=paste0(g1, '\n section 2'), cex=2, verbose=FALSE) plotEmbedding(posList[[3]], colors=winsorize(scale(mat[g1, rownames(posList[[3]])])[,1]), main=paste0(g1, '\n section 3'), cex=2, verbose=FALSE) }))
Alternatively, we can look at genes that are spatially aggregated within multiple sections but not in a manner that is consistent across sections. Such inconsistent patterns may be indicative of section-specific structures or technical artefacts.
gdiff <- setdiff(gdups, gcross) gdiff <- gdiff[order(L1$I[gdiff,]$observed, decreasing=TRUE)] # Plot inconsistent gene invisible(lapply(gdiff[1], function(g1) { par(mfrow=c(1,3), mar=rep(2,4)) plotEmbedding(posList[[1]], colors=winsorize(scale(mat[g1, rownames(posList[[1]])])[,1]), main=paste0(g1, '\n section 1'), cex=2, verbose=FALSE) plotEmbedding(posList[[2]], colors=winsorize(scale(mat[g1, rownames(posList[[2]])])[,1]), main=paste0(g1, '\n section 2'), cex=2, verbose=FALSE) plotEmbedding(posList[[3]], colors=winsorize(scale(mat[g1, rownames(posList[[3]])])[,1]), main=paste0(g1, '\n section 3'), cex=2, verbose=FALSE) }))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.