The following notebook highlights the analysis shown in the Vesalius 2.0 manuscript. More specifcially, we show the analysis for Spatial Transcriptomic Slide-SeqV2 in mouse brain and mouse embryo.
The aim of this notebook is two-fold:
You will find through out this article subsection with the prefix "See also". These sections will highlight link towards documentation that will enable you to go beyond what we did with vesalius for this analysis.
You will also find a Code Block section that contains the entire analysis in a single block of code to facilitate copy/pasting.
Vesalius is a tool to decipher tissue anatomy and spatial domains from spatial omics data. To achieve this, Vesalius converts latent space embeddings into images upon which image processing techniques can be applied. In contrast to many other tools, Vesalius does not create a spatially aware latent space but utilises images that represent latent space to isolate tissue domains.
Slide-seq version 2 is a high resolution spatial trancriptomics data that uses spatially indexed beads that hybridise with mRNA species prensent in the tissue. As such, there are a few aspect to consider when using this type of data:
Vesalius aims to recover the finer details of spatial patterning by providing uniform tissue territories.
Uniform territories enable:
Before starting, make sure you have vesalius installed and that you have your data ready and downloaded.
To facilitate the analysis process, we use input/output folder to store and save data.
For visualization purposes, we also load a few extra packages. We also load the future
package
to limit the number of cores being used.
# Load libraries and set seed library(ggplot2) library(patchwork) library(ggpubr) library(future) library(Matrix) library(tidyverse) set.seed(1514) library(vesalius, lib.loc = "/common/martinp4/R") library(pwr, lib.loc = "/common/martinp4/R") # Using Multi core !!! See note below !!! plan(multicore, workers = 10) # Input data directory Brain spatial_coordinates_brain <- "/common/wonklab/SSv2/Puck_200115_08_bead_locations.csv" counts_brain <- "/common/wonklab/SSv2/Puck_200115_08.digital_expression.txt.gz" # Input data directory Embryo spatial_coordinates_embryo <- "/common/wonklab/SSv2/Puck_190926_03_bead_locations.csv" counts_embryo <- "/common/wonklab/SSv2/Puck_190926_03.digital_expression.txt.gz" # Output directory if (!dir.exists("/common/martinp4/output_plots/")) { dir.create("/common/martinp4/output_plots/") } output_plots <- "/common/martinp4/output_plots/" if (!dir.exists("/common/martinp4/output_data/")) { dir.create("/common/martinp4/output_data/") } output_data <- "/common/martinp4/output_data/"
NOTE Here, we use multicore processing from the future and future.apply packages.
To use a single core, use plan(sequential)
If you are using a windows machine, multicore is not supported.
Instead you can use plan(multisession, workers = 10)
The first step is to load the spatial transcriptomic data. Generally, spatial data comes in two seperate files.
Reading in spatial coordinates.
spatial_coordinates_brain <- read.csv(spatial_coordinates_brain, header = TRUE, skip = 1) colnames(spatial_coordinates_brain) <- c("barcodes", "x", "y") spatial_coordinates_embryo <- read.csv(spatial_coordinates_embryo, header = TRUE) colnames(spatial_coordinates_embryo) <- c("barcodes", "x", "y") str(spatial_coordinates_brain) str(spatial_coordinates_embryo)
Note the use of skip = 1
. This may not be required depending on how your data is formatted.
Reading in count data
counts_brain <- read.table(counts_brain, header = TRUE, row.names = 1) counts_embryo <- read.table(counts_embryo, header = TRUE, row.names = 1) counts_brain[1:5, 1:5]
The first step in the Vesalius Pipeline is to build a vesalius_assay object. This object is used throughout the vesalius workflow and stores data after every round of computation.
To build a vesalius_assay
:
vesalius_brain <- build_vesalius_assay(coordinates = spatial_coordinates_brain, counts = counts_brain, assay = "Spatial_transcriptomics_brain", verbose = FALSE) vesalius_embryo <- build_vesalius_assay(coordinates = spatial_coordinates_embryo, counts = counts_embryo, assay = "Spatial_transcriptomics_embryo", verbose = FALSE) vesalius_brain vesalius_embryo
vesalius_assay
objects require as minimal input the spatial coordinates.
The next step is to generate pixels tiles from spatial coordinates and latent space embedding that will be used to populate each tile with a grey scale color value.
vesalius_brain <- generate_embeddings(vesalius_brain, dim_reduction = "PCA", dimensions = 50, tensor_resolution = 1, verbose = FALSE) vesalius_embryo <- generate_embeddings(vesalius_embryo, dim_reduction = "PCA", dimensions = 50, tensor_resolution = 0.3, verbose = FALSE) vesalius_brain vesalius_embryo
Here, we will use Principal Component Analysis (PCA) to reduce data dimensionality and produce our latent space. We will also reduce the dimension of the image tensor. Effectively, we are reducing the size of the image by combining neighboring spatial indices if they fall within the same local area. Reducing the image resolution ensures lower run times.
In addition to merging spatial indices, we adjust the count matrix by taking the average count values for all spatial indices that have been merged together. Note that the spatial index names become a combination of the original spatial indices.
For example:
head(grep(pattern = "_et_", x = unique(get_tiles(vesalius_brain)$barcodes), value = TRUE))
The generate_embeddings
function generates tiles from spatial indices, pre-processes the count matrix
(Finding variable features and normalisation)
, and generates latent space embeddings used in the vesalius image stack.
The core concept of Vesalius is to provide spatial domains by utilizing image processing techniques. In contrast to other methods, Vesalius does not provide spatially aware latent space by default. It utilises Principal Component Analsyis (as well as UMAP or SVD/LSI) to generate a reduced dimesnionaility space.
We find that Principal Components (PC) contain subtle spatial information already embedded within them. We can viszualise this by looking at the grey scale embeddings for each PC.
pca_embedding_brain <- vector("list", 30) pca_embedding_embryo <- vector("list", 30) for (i in seq(1, 30)){ pca_embedding_brain[[i]] <- image_plot(vesalius_brain, embedding = "last", dimensions = i) pca_embedding_embryo[[i]] <- image_plot(vesalius_embryo, embedding = "last", dimensions = i) }
For the mouse brain:
print(ggarrange(plotlist = pca_embedding_brain), ncol = 5)
For the mouse embryo:
print(ggarrange(plotlist = pca_embedding_embryo), ncol = 5)
We can see that many of these PCs contain spatially relevant information even though the latent space was not produced in an explitely spatially aware manner. By using image processing, we can further highlight these regions and better utilise them during spatial domain isolation.
vesalius_brain <- regularise_image(vesalius_brain, dimensions = seq(1, 50), verbose = FALSE) vesalius_brain <- equalize_image(vesalius_brain, dimensions = seq(1, 50), sleft = 5, sright = 5, verbose = FALSE) vesalius_brain <- smooth_image(vesalius_brain, dimensions = seq(1, 50), method = c("iso", "box"), sigma = 2, box = 10, iter = 10, verbose = FALSE) vesalius_embryo <- regularise_image(vesalius_embryo, embedding = "PCA", dimensions = seq(1, 50), lambda = 10, verbose = FALSE) vesalius_embryo <- equalize_image(vesalius_embryo, dimensions = seq(1, 50), sleft = 1, sright = 1, verbose = FALSE) vesalius_embryo <- smooth_image(vesalius_embryo, dimensions = seq(1, 50), method = c("iso", "box"), sigma = 1, box = 10, across_levels = "min", iter = 10, verbose = FALSE)
We can viszualise each PC in the image tensor after image processing.
pca_embedding_brain <- vector("list", 30) pca_embedding_embryo <- vector("list", 30) for (i in seq(1, 30)){ pca_embedding_brain[[i]] <- image_plot(vesalius_brain, embedding = "last", dimensions = i) pca_embedding_embryo[[i]] <- image_plot(vesalius_embryo, embedding = "last", dimensions = i) }
For example, in the mous brain, we can cleary see that PC17 contains information related to the CA2 field - a field that is often lost during standard analysis.
print(ggarrange(plotlist = pca_embedding_brain), ncol = 5)
For the mouse embryo:
print(ggarrange(plotlist = pca_embedding_embryo), ncol = 5)
In vesalius, each PC is handled independantly in the form of a grey scale image. We can use this image stack to generate isolated spatial domains and recover subtle expression patterns. The selection of image processing parameters is dependant on the data set used and on the biological question at hand. You might want to have more or less territory granularity. However, the ability to visualize PCs greatly aids is selecting the parameters that fits your needs.
Image prcessing is iteratively applied to the active embedding.
The isolation of territories is achieved by a combination of image segmentation and 2D isolation of image segments. By default, we use kmeans for image segmentation. This produces a series of image segments that can then be further isolated if they are sperated in 2D space.
vesalius_brain <- segment_image(vesalius_brain, dimensions = seq(1, 30), method = "kmeans", col_resolution = 25, verbose = FALSE) vesalius_brain <- isolate_territories(vesalius_brain, verbose = FALSE) vesalius_embryo <- segment_image(vesalius_embryo, dimensions = seq(1, 30), method = "kmeans", col_resolution = 32, verbose = FALSE) vesalius_embryo <- isolate_territories(vesalius_embryo, verbose = FALSE)
In this instance, we elected to use all images in the image stack (i.e dimensions = seq(1,30)
).
In the context of vesalius, the col_resolution
argument represents the number of clusters to
parse to kmeans
however this does not define the final number of territories. These segments
will be subdivided into small segments based on the distribution in space.
By default, vesalius uses kmeans clustering as it provides a simple and effective way to segment images.
As mentionned above we distinguish image segements and territories. We can vizualise these differences with the vesalius plotting functions.
brain_segments <- territory_plot(vesalius_brain, trial = "Segment", cex_pt = 0.25) + labs(title = "Brain Segments") brain_territories <- territory_plot(vesalius_brain, trial = "Territory", cex_pt = 0.25) + labs(title = "Brain Territories") embryo_segments <- territory_plot(vesalius_embryo, trial = "Segment", cex_pt = 0.25) + labs(title = "Embryo Segments") embryo_territories <- territory_plot(vesalius_embryo, trial = "Territory", cex_pt = 0.25) + labs(title = "Embryo Territories") all <- (brain_segments + brain_territories) / (embryo_segments + embryo_territories) print(all)
We can see how image segments and territories differ when compareing their plots. While
segments are defined by the value parsed to col_resolution
during image segmentation,
the final number of territories will depend on how the images were processed, the value parsed to
col_resolution
and the capture radius defined in isolate_territories.
In short, this parameters defines the maximum distance between color segments for them to be pooled
into the same territory.
The advantage of uniform territories is that it enables the comparison of spatial territories rather than comparing cell clusters.
We can start by getting all genes that are differentially expressed in each territory and export the results.
vesalius_brain <- identify_markers(vesalius_brain, trial = "Territory", # Using PCA territories here method = "DESeq2", seed = NULL, query = NULL ) vesalius_embryo <- identify_markers(vesalius_embryo, trial = "Territory", # Using PCA territories here method = "DESeq2", seed = NULL, query = NULL ) deg_brain <- get_markers(vesalius_brain) write.csv(deg_brain, file = paste0(output_data, "DEG_SSv2_brain.csv")) deg_embryo <- get_markers(vesalius_embryo) write.csv(deg_embryo, file = paste0(output_data, "DEG_SSv2_embryo.csv")) head(deg_brain)
Using these tables, we can ivestigate differntial gene expression and find which genes
are more highly expressed in a given territory comapred to everything else. If you want
to only look at one or a subset of territories, you can specifiy your territories in the
seed
argument.
vesalius_brain <- identify_markers(vesalius_brain, trial = "Territory", # Using PCA territories here method = "DESeq2", seed = c(1, 2, 3), query = NULL)
We can also use this option to compare territories between each other. If there are many territories it can be easier to visualize each territory seperately.
brain_split <- territory_plot(vesalius_brain, trial = "Territory", split = TRUE, randomise = FALSE # we don't reallyt need to randomise colors here ) brain_split
In this case, we can compare territory 18 and territory 3 (medial habenula compartments).
vesalius_brain <- identify_markers(vesalius_brain, trial = "Territory", method = "DESeq2", seed = 18, query = 3) med_comp <- get_markers(vesalius_brain) head(med_comp)
The indentify_markers
function offers different comparison options.
The final step is to visualise gene expression patterns.
For Pcp4 in the mouse hippocampus:
pcp4 <- view_gene_expression(vesalius_brain, genes = "Pcp4", trial = "Territory", cex = 4) pcp4_layer <- view_gene_expression(vesalius_brain, genes = "Pcp4", trial = "Territory", as_layer = TRUE, cex = 4) pcp4 + pcp4_layer + plot_annotation(tag_levels = "A")
For Tac1 in the mouse hippocampus:
Tac1 <- view_gene_expression(vesalius_brain, genes = "Tac1", trial = "Territory", cex = 4) Tac1_layer <- view_gene_expression(vesalius_brain, genes = "Tac1", trial = "Territory", as_layer = TRUE, cex = 4) Tac1 + Tac1_layer + plot_annotation(tag_levels = "A")
For Pmel in the mouse embryo:
Pmel <- view_gene_expression(vesalius_embryo, genes = "Pmel", trial = "Territory", cex = 4) Pmel_layer <- view_gene_expression(vesalius_embryo, genes = "Pmel", trial = "Territory", as_layer = TRUE, cex = 4) Pmel + Pmel_layer + plot_annotation(tag_levels = "A")
We can visualise gene expression as an overall pattern ("A" plots) or an averge expression in each territory ("B" plots). It is important to use "B" plots with caution. While these plots can help to clarify the expression distribution across territories, they represent a discretised view of expression patterns. As such many transcriptional dynamics such as gradients or overall low expression can altered.
You can also parse a vector of genes to the genes
argument. This will return a ggarrange
object that can directly be plotted.
Here, we present a Vesalius workflow using 2 high resolution spatial transcriptomic data sets. This pipeline is applicabled to any spatial transcriptomic data set. However, image processing parameteres should be set according to your needs and the data set!
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.