knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>"
)
## For links
library("BiocStyle")

## Track time spent on making the vignette
startTime <- Sys.time()

## Bib setup
library("RefManageR")

## Write bibliography information
bib <- c(
    R = citation(),
    MatrixGenerics = citation("MatrixGenerics")[1],
    knitr = citation("knitr")[3],
    RColorBrewer = citation("RColorBrewer")[1],
    RefManageR = citation("RefManageR")[1],
    rmarkdown = citation("rmarkdown")[1],
    sessioninfo = citation("sessioninfo")[1],
    SpatialExperiment = citation("SpatialExperiment")[1],
    spatialLIBD = citation("spatialLIBD")[1],
    HumanPilot = citation("spatialLIBD")[2],
    spatialDLPFC = citation("spatialLIBD")[3],
    tran2021 = RefManageR::BibEntry(
        bibtype = "Article",
        key = "tran2021",
        author = "Tran, Matthew N. and Maynard, Kristen R. and Spangler, Abby and Huuki, Louise A. and Montgomery, Kelsey D. and Sadashivaiah, Vijay and Tippani, Madhavi and Barry, Brianna K. and Hancock, Dana B. and Hicks, Stephanie C. and Kleinman, Joel E. and Hyde, Thomas M. and Collado-Torres, Leonardo and Jaffe, Andrew E. and Martinowich, Keri",
        title = "Single-nucleus transcriptome analysis reveals cell-type-specific molecular signatures across reward circuitry in the human brain",
        year = 2021, doi = "10.1016/j.neuron.2021.09.001",
        journal = "Neuron"
    )
)

One of the goals of spatialLIBD is to provide options for visualizing Visium data by 10x Genomics. In particular, vis_gene() and vis_clus() allow plotting of individual continuous or discrete quantities belonging to each Visium spot, in a spatially accurate manner and optionally atop histology images.

This vignette explores a more complex capability of vis_gene(): to visualize a summary metric of several continuous variables simultaneously. We'll start with a basic one-gene use case for vis_gene() before moving to more advanced cases.

First, let's load some example data for us to work on. This data is a subset from a recent publication with Visium data from the dorsolateral prefrontal cortex (DLPFC) r Citep(bib[['spatialDLPFC']]).

library("spatialLIBD")
spe <- fetch_data(type = "spatialDLPFC_Visium_example_subset")
spe

Next, let's define several genes known to be markers for white matter r Citep(bib[['tran2021']]).

white_matter_genes <- c("GFAP", "AQP4", "MBP", "PLP1")
white_matter_genes <- rowData(spe)$gene_search[
    rowData(spe)$gene_name %in% white_matter_genes
]

## Our list of white matter genes
white_matter_genes

Plotting One Gene

A typical use of vis_gene() involves plotting the spatial distribution of a single gene or continuous variable of interest. For example, let's plot just the expression of GFAP.

vis_gene(
    spe,
    geneid = white_matter_genes[1],
    point_size = 1.5
)

We can see a little V shaped section with higher expression of this gene. This seems to mark the location of layer 1. The bottom right corner seems to mark the location of white matter.

plot(imgRaster(spe))

This particular gene is known to have high expression in both layer 1 and white matter in the dorsolateral prefrontal cortex as can be seen below r Citep(bib[['HumanPilot']]). It's the 386th highest ranked white matter marker gene based on the enrichment test.

modeling_results <- fetch_data(type = "modeling_results")
sce_layer <- fetch_data(type = "sce_layer")
sig_genes <- sig_genes_extract_all(
    n = 400,
    modeling_results = modeling_results,
    sce_layer = sce_layer
)
i_gfap <- subset(sig_genes, gene == "GFAP" &
    test == "WM")$top
i_gfap
set.seed(20200206)
layer_boxplot(
    i = i_gfap,
    sig_genes = sig_genes,
    sce_layer = sce_layer
)

Plotting Multiple Genes

As of version 1.15.2, the geneid parameter to vis_gene() may also take a vector of genes or continuous variables in colData(spe). In this way, the expression of multiple continuous variables can be summarized into a single value for each spot, displayed just as a single input for geneid would be. spatialLIBD provides three methods for merging the information from multiple continuous variables, which may be specified through the multi_gene_method parameter to vis_gene().

Averaging Z-scores

The default is multi_gene_method = "z_score". Essentially, each continuous variable (could be a mix of genes with spot-level covariates) is normalized to be a Z-score by centering and scaling. If a particular spot has a value of 1 for a particular continuous variable, this would indicate that spot has expression one standard deviation above the mean expression across all spots for that continuous variable. Next, for each spot, Z-scores are averaged across continuous variables. Compared to simply averaging raw gene expression across genes, the "z_score" method is insensitive to absolute expression levels (highly expressed genes don't dominate plots), and instead focuses on how each gene varies spatially, weighting each gene equally.

Let's plot all four white matter genes using this method.

vis_gene(
    spe,
    geneid = white_matter_genes,
    multi_gene_method = "z_score",
    point_size = 1.5
)

Now the bottom right corner where the white matter is located starts to pop up more, though the mixed layer 1 and white matter signal provided by GFAP is still noticeable (the V shape).

Summarizing with PCA

Another option is multi_gene_method = "pca". A matrix is formed, where genes or continuous features are columns, and spots are rows. PCA is performed, and the first principal component is plotted spatially. The idea is that the first PC captures the dominant spatial signature of the feature set. Next, its direction is reversed if the majority of coefficients (from the "rotation matrix") across features are negative. When the features are genes whose expression is highly correlated (like our white-matter-gene example!), this optional reversal encourages higher values in the plot to represent areas of higher expression of the features. For our case, this leads to the intuitive result that "expression" is higher in white matter for white-matter genes, which is not otherwise guaranteed (the "sign" of PCs is arbitrary)!

vis_gene(
    spe,
    geneid = white_matter_genes,
    multi_gene_method = "pca",
    point_size = 1.5
)

Plotting Sparsity of Expression

This final option is multi_gene_method = "sparsity". For each spot, the proportion of features with positive expression is plotted. This method is typically only meaningful when features are raw gene counts that are expected to be quite sparse (have zero counts) at certain regions of the tissue and not others. It also performs better with a larger number of genes; with our example of four white-matter genes, the proportion may only hold values of 0, 0.25, 0.5, 0.75, and 1, which is not visually informative.

The white-matter example is thus poor due to lack of sparsity and low number of genes as you can see below.

vis_gene(
    spe,
    geneid = white_matter_genes,
    multi_gene_method = "sparsity",
    point_size = 1.5
)

With more marker genes

Below we can plot via multi_gene_method = "z_score" the top 25 or top 50 white matter marker genes identified via the enrichment test in a previous dataset r Citep(bib[['HumanPilot']]).

vis_gene(
    spe,
    geneid = subset(sig_genes, test == "WM")$ensembl[seq_len(25)],
    multi_gene_method = "z_score",
    point_size = 1.5
)

vis_gene(
    spe,
    geneid = subset(sig_genes, test == "WM")$ensembl[seq_len(50)],
    multi_gene_method = "z_score",
    point_size = 1.5
)

We can repeat this process for multi_gene_method = "pca".

vis_gene(
    spe,
    geneid = subset(sig_genes, test == "WM")$ensembl[seq_len(25)],
    multi_gene_method = "pca",
    point_size = 1.5
)

vis_gene(
    spe,
    geneid = subset(sig_genes, test == "WM")$ensembl[seq_len(50)],
    multi_gene_method = "pca",
    point_size = 1.5
)

And finally, lets look at the results of multi_gene_method = "sparsity".

vis_gene(
    spe,
    geneid = subset(sig_genes, test == "WM")$ensembl[seq_len(25)],
    multi_gene_method = "sparsity",
    point_size = 1.5
)

vis_gene(
    spe,
    geneid = subset(sig_genes, test == "WM")$ensembl[seq_len(50)],
    multi_gene_method = "sparsity",
    point_size = 1.5
)

In this case, it seems that for both the top 25 or top 50 marker genes, z_score and pca provided cleaner visualizations than sparsity. Give them a try on your own datasets!

Visualizing non-gene continuous variables

So far, we have only visualized multiple genes. But these methods can be applied to several continuous variables stored in colData(spe) as shown below.

vis_gene(
    spe,
    geneid = c("sum_gene", "sum_umi"),
    multi_gene_method = "z_score",
    point_size = 1.5
)

We can also combine continuous variables from colData(spe) along with actual genes. We can combine for example the expression of GFAP, which is a known astrocyte marker gene, with the spot deconvolution results for astrocytes computed using Tangram r Citep(bib[['spatialDLPFC']]).

vis_gene(
    spe,
    geneid = c("broad_tangram_astro"),
    point_size = 1.5
)
vis_gene(
    spe,
    geneid = c("broad_tangram_astro", white_matter_genes[1]),
    multi_gene_method = "pca",
    point_size = 1.5
)

These tools enable you to further explore your data in new ways. Have fun using them!

Reproducibility

Code for creating the vignette

## Create the vignette
library("rmarkdown")
system.time(render("multi_gene_plots.Rmd"))

## Extract the R code
library("knitr")
knit("multi_gene_plots.Rmd", tangle = TRUE)

Date the vignette was generated.

## Date the vignette was generated
Sys.time()

Wallclock time spent generating the vignette.

## Processing time in seconds
totalTime <- diff(c(startTime, Sys.time()))
round(totalTime, digits = 3)

R session information.

## Session info
library("sessioninfo")
options(width = 120)
session_info()

Bibliography

This vignette was generated using r Biocpkg('BiocStyle') r Citep(bib[['BiocStyle']]), r CRANpkg('knitr') r Citep(bib[['knitr']]) and r CRANpkg('rmarkdown') r Citep(bib[['rmarkdown']]) running behind the scenes.

Citations made with r CRANpkg('RefManageR') r Citep(bib[['RefManageR']]).

## Print bibliography
PrintBibliography(bib, .opts = list(hyperlink = "to.doc", style = "html"))


LieberInstitute/spatialLIBD documentation built on Nov. 4, 2024, 11:57 a.m.