knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
## For links library("BiocStyle") ## Track time spent on making the vignette startTime <- Sys.time() ## Bib setup library("RefManageR") ## Write bibliography information bib <- c( R = citation(), MatrixGenerics = citation("MatrixGenerics")[1], knitr = citation("knitr")[3], RColorBrewer = citation("RColorBrewer")[1], RefManageR = citation("RefManageR")[1], rmarkdown = citation("rmarkdown")[1], sessioninfo = citation("sessioninfo")[1], SpatialExperiment = citation("SpatialExperiment")[1], spatialLIBD = citation("spatialLIBD")[1], HumanPilot = citation("spatialLIBD")[2], spatialDLPFC = citation("spatialLIBD")[3], tran2021 = RefManageR::BibEntry( bibtype = "Article", key = "tran2021", author = "Tran, Matthew N. and Maynard, Kristen R. and Spangler, Abby and Huuki, Louise A. and Montgomery, Kelsey D. and Sadashivaiah, Vijay and Tippani, Madhavi and Barry, Brianna K. and Hancock, Dana B. and Hicks, Stephanie C. and Kleinman, Joel E. and Hyde, Thomas M. and Collado-Torres, Leonardo and Jaffe, Andrew E. and Martinowich, Keri", title = "Single-nucleus transcriptome analysis reveals cell-type-specific molecular signatures across reward circuitry in the human brain", year = 2021, doi = "10.1016/j.neuron.2021.09.001", journal = "Neuron" ) )
One of the goals of spatialLIBD
is to provide options for visualizing Visium data by 10x Genomics. In
particular, vis_gene()
and vis_clus()
allow plotting of individual continuous or
discrete quantities belonging to each Visium spot, in a spatially accurate manner and
optionally atop histology images.
This vignette explores a more complex capability of vis_gene()
: to visualize a summary
metric of several continuous variables simultaneously. We'll start with a basic one-gene
use case for vis_gene()
before moving to more advanced cases.
First, let's load some example data for us to work on. This data is a subset from a recent publication with Visium data from the dorsolateral prefrontal cortex (DLPFC) r Citep(bib[['spatialDLPFC']])
.
library("spatialLIBD") spe <- fetch_data(type = "spatialDLPFC_Visium_example_subset") spe
Next, let's define several genes known to be markers for white matter r Citep(bib[['tran2021']])
.
white_matter_genes <- c("GFAP", "AQP4", "MBP", "PLP1") white_matter_genes <- rowData(spe)$gene_search[ rowData(spe)$gene_name %in% white_matter_genes ] ## Our list of white matter genes white_matter_genes
A typical use of vis_gene()
involves
plotting the spatial distribution of a single gene or continuous variable of interest.
For example, let's plot just the expression of GFAP.
vis_gene( spe, geneid = white_matter_genes[1], point_size = 1.5 )
We can see a little V shaped section with higher expression of this gene. This seems to mark the location of layer 1. The bottom right corner seems to mark the location of white matter.
plot(imgRaster(spe))
This particular gene is known to have high expression in both layer 1 and white matter in the dorsolateral prefrontal cortex as can be seen below r Citep(bib[['HumanPilot']])
. It's the 386th highest ranked white matter marker gene based on the enrichment test.
modeling_results <- fetch_data(type = "modeling_results") sce_layer <- fetch_data(type = "sce_layer") sig_genes <- sig_genes_extract_all( n = 400, modeling_results = modeling_results, sce_layer = sce_layer ) i_gfap <- subset(sig_genes, gene == "GFAP" & test == "WM")$top i_gfap set.seed(20200206) layer_boxplot( i = i_gfap, sig_genes = sig_genes, sce_layer = sce_layer )
As of version 1.15.2, the geneid
parameter to vis_gene()
may also take a vector of genes or continuous
variables in colData(spe)
. In this way, the expression of multiple continuous variables can be summarized
into a single value for each spot, displayed just as a single input for geneid
would be.
spatialLIBD
provides three methods for merging the information from multiple continuous
variables, which may be specified through the multi_gene_method
parameter to vis_gene()
.
The default is multi_gene_method = "z_score"
. Essentially, each continuous variable (could be a mix of genes with spot-level covariates) is
normalized to be a Z-score by centering and scaling. If a particular spot has a value of 1
for a particular continuous variable,
this would indicate that spot has expression one standard deviation above the mean expression
across all spots for that continuous variable. Next, for each spot, Z-scores are averaged across continuous variables.
Compared to simply averaging raw gene expression across genes, the "z_score"
method
is insensitive to absolute expression levels (highly expressed genes don't dominate plots),
and instead focuses on how each gene varies spatially, weighting each gene equally.
Let's plot all four white matter genes using this method.
vis_gene( spe, geneid = white_matter_genes, multi_gene_method = "z_score", point_size = 1.5 )
Now the bottom right corner where the white matter is located starts to pop up more, though the mixed layer 1 and white matter signal provided by GFAP is still noticeable (the V shape).
Another option is multi_gene_method = "pca"
. A matrix is formed, where genes or continuous
features are columns, and spots are rows. PCA is performed, and the first principal component
is plotted spatially. The idea is that the first PC captures the dominant spatial signature
of the feature set. Next, its direction is reversed if the majority of coefficients (from the
"rotation matrix") across features are negative. When the features are genes whose expression
is highly correlated (like our white-matter-gene example!), this optional reversal encourages
higher values in the plot to represent areas of higher expression of the features. For our case,
this leads to the intuitive result that "expression" is higher in white matter for white-matter
genes, which is not otherwise guaranteed (the "sign" of PCs is arbitrary)!
vis_gene( spe, geneid = white_matter_genes, multi_gene_method = "pca", point_size = 1.5 )
This final option is multi_gene_method = "sparsity"
. For each spot, the proportion of features
with positive expression is plotted. This method is typically only meaningful when features
are raw gene counts that are expected to be quite sparse (have zero counts) at certain regions
of the tissue and not others. It also performs better with a larger number of genes; with our
example of four white-matter genes, the proportion may only hold values of 0, 0.25, 0.5, 0.75,
and 1, which is not visually informative.
The white-matter example is thus poor due to lack of sparsity and low number of genes as you can see below.
vis_gene( spe, geneid = white_matter_genes, multi_gene_method = "sparsity", point_size = 1.5 )
Below we can plot via multi_gene_method = "z_score"
the top 25 or top 50 white matter marker genes identified via the enrichment test in a previous dataset r Citep(bib[['HumanPilot']])
.
vis_gene( spe, geneid = subset(sig_genes, test == "WM")$ensembl[seq_len(25)], multi_gene_method = "z_score", point_size = 1.5 ) vis_gene( spe, geneid = subset(sig_genes, test == "WM")$ensembl[seq_len(50)], multi_gene_method = "z_score", point_size = 1.5 )
We can repeat this process for multi_gene_method = "pca"
.
vis_gene( spe, geneid = subset(sig_genes, test == "WM")$ensembl[seq_len(25)], multi_gene_method = "pca", point_size = 1.5 ) vis_gene( spe, geneid = subset(sig_genes, test == "WM")$ensembl[seq_len(50)], multi_gene_method = "pca", point_size = 1.5 )
And finally, lets look at the results of multi_gene_method = "sparsity"
.
vis_gene( spe, geneid = subset(sig_genes, test == "WM")$ensembl[seq_len(25)], multi_gene_method = "sparsity", point_size = 1.5 ) vis_gene( spe, geneid = subset(sig_genes, test == "WM")$ensembl[seq_len(50)], multi_gene_method = "sparsity", point_size = 1.5 )
In this case, it seems that for both the top 25 or top 50 marker genes, z_score
and pca
provided cleaner visualizations than sparsity
. Give them a try on your own datasets!
So far, we have only visualized multiple genes. But these methods can be applied to several continuous variables stored in colData(spe)
as shown below.
vis_gene( spe, geneid = c("sum_gene", "sum_umi"), multi_gene_method = "z_score", point_size = 1.5 )
We can also combine continuous variables from colData(spe)
along with actual genes. We can combine for example the expression of GFAP, which is a known astrocyte marker gene, with the spot deconvolution results for astrocytes computed using Tangram r Citep(bib[['spatialDLPFC']])
.
vis_gene( spe, geneid = c("broad_tangram_astro"), point_size = 1.5 ) vis_gene( spe, geneid = c("broad_tangram_astro", white_matter_genes[1]), multi_gene_method = "pca", point_size = 1.5 )
These tools enable you to further explore your data in new ways. Have fun using them!
Code for creating the vignette
## Create the vignette library("rmarkdown") system.time(render("multi_gene_plots.Rmd")) ## Extract the R code library("knitr") knit("multi_gene_plots.Rmd", tangle = TRUE)
Date the vignette was generated.
## Date the vignette was generated Sys.time()
Wallclock time spent generating the vignette.
## Processing time in seconds totalTime <- diff(c(startTime, Sys.time())) round(totalTime, digits = 3)
R
session information.
## Session info library("sessioninfo") options(width = 120) session_info()
This vignette was generated using r Biocpkg('BiocStyle')
r Citep(bib[['BiocStyle']])
, r CRANpkg('knitr')
r Citep(bib[['knitr']])
and r CRANpkg('rmarkdown')
r Citep(bib[['rmarkdown']])
running behind the scenes.
Citations made with r CRANpkg('RefManageR')
r Citep(bib[['RefManageR']])
.
## Print bibliography PrintBibliography(bib, .opts = list(hyperlink = "to.doc", style = "html"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.