Introduction

SpaceST is an R toolkit for Spatial Transcriptomics (ST) data, developed and maintained by the SpatialTranscriptomics lab at Science for Life Laboratory. SpaceST allows you to organize, filter and analyze ST datasets from multiple tissue sections and experiments. To take full advantage of the package, we recommend you to work in RStudio.

Part I: Prepare data

1. Install package
2. Prepare data 
3. Create spaceST object and run quality checks
4. Select spots under tissue

1. Install package

First of all, you need to download the package from github. For this, you will need to install the devtools package. Once devtools has been installed, you can install spaceST by typing the following in the RStudio console:

library(devtools)
install_github("ludvigla/spaceST")

2. Prepare data

The first step is to load gene expression matrices from an ST experiment. The format of these expression datasets is important, because the feature coordinates will be collected from the headers of these datasets. The format required is features as columns and genes as rows. The headers need to have the following format:

n_x_y

, where n is the number of the replicate, and x, y are feature coordinates.

Once you have prepared the expression matrices with the correct headers, they should be saved into a list of matrices. The package includes two example dataset theat were published in Science (2016) that can be loaded into your R session.

library(spaceST)
load(file = system.file("data/MOB.RData", package = "spaceST"))
knitr::kable(MOB1[1:10, 1:5])

Note that the genes are given by their MGI symbols. If you want to convert the gene ids from ensembl_gene_id_version format (default output by the ST pipeline), you can use the ensembl2hgnc() function. Note that duplicated gene names will be aggregated.

3. Create spaceST object and run quality checks

Now you can merge the expression datasets into a spaceST object. When a spaceST object is initialized, you can filter the data from array spots with few unique genes and/or genes with low expression. Here we will keep genes that are expressed with at least 2 counts in at least 10 spots (features) as well as spots with less than 300 unique genes. Alternatively, you should filter the data before initiating the spaceST object. You can also remove unwanted genes/transcripts by providing a regular expression pattern. Here we show how to remove mitochondrial genes and Malat1. Note that you will have to specify the correct delimiter for the ST headers.

# Initialize spaceST object and filter
spST <- CreatespaceSTobject(list(MOB1, MOB2), unique.genes = 300, min.exp = 2, min.features = 10, delimiter = "x", filter.genes = "^Mt-*|Malat1")
spST

dim(spST)

Once you have initiated the spaceST object you can apply new filtering settings by calling the filter() function on the spaceST object. Note that this will clear all slots and generate a new spaceST object using the current expression data matrix as input. If you want to apply a less harsh filtering, you will have to use the original data list and generate a new spaceST object.

If you want to evaluate the quality of you ST datasets, you can visualize the number of unique genes and transcripts as histograms.

# QC plot
plot_QC_spaceST(spST, bins = 40, alpha = 0.7, col2 = c("dark green", "purple"))
# Or if you want to split up relicates in separate plots
plot_QC_spaceST(spST, separate = T, bins = 40, alpha = 0.7, col2 = c("dark green", "purple"))

4. Select spots under tissue

When analyzing ST data, you typically want to select spots that are located under the tissue. In the example datasets provided (MOB), the spots outside tissue have already been removed but you can create a subset spaceST object if you have selection files produced with the ST Spot Detector. All you need is a list of paths to the selection files.

Note that the same filtering settings will be used on the subset spaceST object as were specified before.

# Initialize spaceST object and filter
selection.files <- list("path1", "path2")

# Don't forget to specify the delimiter 
spST.subset <- spots_under_tissue(spST, selection.files, delimiter = "x")
spST

dim(spST)

Part II: Visualization, normalization, batch correction and PCA

1. Visualizing gene expression on tissue HE images
2. Normalize data 
3. PCA

1. Visualizing gene expression on tissue HE images

The spaceST package provides a multfunctional visualization tool called spatial.haetmap() to visualize various types of expression patterns. This function can visualize results from any of the following slots in the spaceST object: expr, norm.data, reducedDims and lda.results.

In the example below we show how to visualize the expression of a specific gene.

spatial.heatmap(spST, value = "Nrgn", invert.heatmap = T, size = 3)

Or if you want to overlay the expression on top of the HE images.

HE.list <- list(system.file("extdata/HE_rep1.jpg", package = "spaceST"),
                system.file("extdata/HE_rep2.jpg", package = "spaceST"))
spatial.heatmap(spST, value = "Nrgn", invert.heatmap = T, HE.list = HE.list, alpha = 0.6, size = 3)

There are a few things that you should keep in mind when using HE images for visualization. First of all, the original images are extremely large and can cause RStudio to hang up if you want to visualize them quickly. We therefore recommend you to downscale the images (2000px*2000px is usually ok). Second, you will have to crop the images to fit the frame of the ST array, meaning that the corners of the image should be centered in the middle of the corner spots (see image 1 above). If you don't crop the image like this, you will have to adjust the x- and y-limits in the function call. Alternativley, you could use absolute pixel coordinates in the headers instead of array coordinates, in which case you can avoid cropping.

2. Normalize data

When you have created a spaceST object you can normalize the data using either scran or cp10k (counts per 10 thousand). If you chose to normalize with scran, the data will automatically be clustered prior to normalization if there are more than 500 spots in the datasets. Alternatively, you can provide a vector of clusters to use with scran. For more details about normalization with scran, we recommend you to have a look at the scran package documentation.

# Normalize using scran
spST <- NormalizespaceST(spST, method = "scran")
load("~/breast_cancer/spaceST/spST")

Now we can check the transcript per feature distributions and visualize the normalized expression data on the tissue.

# Or if you want to split up relicates in separate plots
plot_QC_spaceST(spST, separate = T, bins = 30, alpha = 0.7, col2 = c("dark green", "purple"))
spatial.heatmap(spST, value = "Nrgn", type = "norm.data", invert.heatmap = T, HE.list = HE.list, alpha = 0.6, size = 3)

3. PCA

A Principal Component analysis can be run on the expression data using the most variable genes. You can either select two components to compare in a scatter plot, or visualize each Principal Component separately on the tissue.

spST <- spPCA(spST)
plotPCA(spST, components = c(1, 2))
spatial.heatmap(spST, value = "PC1", type = "pca", invert.heatmap = T, HE.list = HE.list, alpha = 0.6, size = 3)

Part III: factor analysis

1. Compute topics using cellTree package
2. Clustering with topics

1. Compute factors using cellTree package

spaceST includes a helper function which lets you run the lda.compute() function from the cellTree package directly on a spaceST object.

Running this function with default settings tests the 5000 most highly expressed genes and tries to deconvolve the ST data into a number of topics. The number of topics (between 2 to 15) which best explains the data will be returned.

The factor analysis should take some time to compute. Grab a coffee!

spST <- topic_compute(spST, force.recalc = T)

Topics can be visualized on the tissue using the spatial.heatmap() function.

spatial.heatmap(spST, value = "1", type = "topics", invert.heatmap = T, HE.list = HE.list, size = 3, alpha = 0.6)
spatial.heatmap(spST, value = "2", type = "topics", invert.heatmap = T, HE.list = HE.list, size = 3, alpha = 0.6)
spatial.heatmap(spST, value = "3", type = "topics", invert.heatmap = T, HE.list = HE.list, size = 3, alpha = 0.6)

We can also extract the driving genes from each topic.

top_features <- ExtractTopFeaturesST(spST)
knitr::kable(top_features[1:10, ])

2. Clustering with topics

The topic matrix is constructed with features as rows and topics as columns. Each feature is assigned with a proportion for each topic specifying how much the feature is "driven" by that topic. We can cluster the features based on these topic proportion using the topic_clusters() function. Clusters are automatically computed from the topics matrix (after running the LDA) and can be accessed from the spaceST object using the topic.clusters() function.

You can also change rerun the clustering with different settings using the clusterST() function on the spaceST object.

Below is an example of how to visualize the clusters on the tissue.

pal <- palette.select(palette = "spectral")
cols <- rgb(pal(seq(0, 1, length.out = 5)), maxColorValue = 255)

# You can run the clustering algorithm with a predefined minimum cluster size (default = 30)
spST <- clusterST(spST, minClusterSize = 5)
spatial.clusters(spST, HE.list = HE.list, size = 3, alpha = 0.6, cols = cols)

part IV: t-SNE

1. Run t-SNE
2. Project t-SNE results on tissue

1. t-SNE

You can run a t-SNE algorithm (Rtsne package) to structure the spots in high dimensional space using the RunTSNEspaceST() function. By default, the function will use LDA results as input to the t-SNE algorithm, but you can also use data provided from other slots in the spaceST object.

spST <- RunTSNEspaceST(object = spST, use.dims = "topics", dims = 3)

# Visualize t-SNE results
plotTSNE(spST, group_by = "clusters", cols = cols)

2. Project t-SNE results on tissue

Another way of vizualising the t-SNE results is to color code each spot by the relative composition of t-SNE scores for each spot. The color code is computed for each spot by scaling and the t-SNE values to a channel in rgb space. This means that only three dimensions can be used.

spatial.heatmap(spST, value = spST@tsne, type = "", invert.heatmap = T, HE.list = HE.list, size = 3, alpha = 0.6)

part V: Other visualization methods

1. Smooth visualization
2. Tesselation

1. Smooth visualization

smooth.viz(object = spST, value = "PC3", type = "pca", HE.list = HE.list, overlay.spots = T, set.max.alpha = 0.3, alpha = 0.3, palette = "RdBu")

2. Tesselation

tessViz(object = spST, polygon.mask = F, fill.empty = T, select.columns = c(2, 3), datatype = "reducedDims", scale.by.column = T)

part VI: DE analysis

1. DE analysis
2. Volcano plot


ludvigla/spaceST documentation built on May 29, 2019, 3:43 a.m.