_spatialHeatmap_: Visualizing Spatial Assays in Anatomical Images and Network Graphs
In spatialHeatmap: spatialHeatmap

```{css, echo=FALSE} pre code { white-space: pre !important; overflow-x: scroll !important; word-break: keep-all !important; word-wrap: initial !important; }

```r
## ThG: chunk added to enable global knitr options. The below turns on
## caching for faster vignette re-build during text editing.
#knitr::opts_chunk$set(cache=TRUE)

library(knitr); opts_chunk$set(message=FALSE, warning=FALSE)

Introduction

Motivation

The spatialHeatmap package provides functionalities for visualizing cell-, tissue- and organ-specific data of biological assays by coloring the corresponding spatial features defined in anatomical images according to a numeric color key. The color scheme used to represent the assay values can be customized by the user. This core functionality of the package is called a spatial heatmap (SHM) plot. It is enhanced with visualization tools for groups of measured items (e.g. gene modules) sharing related abundance profiles, including matrix heatmaps combined with hierarchical clustering dendrograms and network representations. The functionalities of spatialHeatmap can be used either in a command-driven mode from within R or a graphical user interface (GUI) provided by a Shiny App that is also part of this package. While the R-based mode provides flexibility to customize and automate analysis routines, the Shiny App includes a variety of convenience features that will appeal to experimentalists and other users less familiar with R. Moreover, the Shiny App can be used on both local computers as well as centralized server-based deployments (e.g. cloud-based or custom servers) that can be accessed remotely as a public web service for using spatialHeatmap's functionalities with community and/or private data. The functionalities of the spatialHeatmap package are illustrated in Figure \@ref(fig:illus).

include_graphics('img/spatialHeatmap_Design.jpg')

As anatomical images the package supports both tissue maps from public repositories and custom images provided by the user. In general any type of image can be used as long as it can be provided in SVG (Scalable Vector Graphics) format, where the corresponding spatial features have been defined (see aSVG below). The numeric values plotted onto an SHM are usually quantitative measurements from a wide range of profiling technologies, such as microarrays, next generation sequencing (e.g. RNA-Seq and scRNA-Seq), proteomics, metabolomics, or many other small- or large-scale experiments. For convenience, several preprocessing and normalization methods for the most common use cases are included that support raw and/or preprocessed data. Currently, the main application domains of the spatialHeatmap package are numeric data sets and spatially mapped images from biological, agricultural and biomedical areas. Moreover, the package has been designed to also work with many other spatial data types, such a population data plotted onto geographic maps. This high level of flexibility is one of the unique features of spatialHeatmap. Related software tools for biological applications in this field are largely based on pure web applications [@Lekschas2015-gd; @Papatheodorou2018-jy; @Winter2007-bq; @Waese2017-fx] or local tools [@Maag2018-gi; @Muschelli2014-av] that typically lack customization functionalities. These restrictions limit users to utilizing pre-existing expression data and/or fixed sets of anatomical image collections. To close this gap for biological use cases, we have developed spatialHeatmap as a generic R/Bioconductor package for plotting quantitative values onto any type of spatially mapped images in a programmable environment and/or in an intuitive to use GUI application.

Design {#design}

The core feature of spatialHeatmap is to map assay values (e.g. gene expression data) of one or many items (e.g. genes) measured under different conditions in form of numerically graded colors onto the corresponding cell types or tissues represented in a chosen SVG image. In the gene profiling field, this feature supports comparisons of the expression values among multiple genes by plotting their SHMs next to each other. Similarly, one can display the expression values of a single or multiple genes across multiple conditions in the same plot (Figure \@ref(fig:mul)). This level of flexibility is very efficient for visualizing complicated expression patterns across genes, cell types and conditions. In case of more complex anatomical images with overlapping multiple layer tissues, it is important to visually expose the tissue layer of interest in the plots. To address this, several default and customizable layer viewing options are provided. They allow to hide features in the top layers by making them transparent in order to expose features below them. This transparency viewing feature is highlighted below in the mouse example (Figure \@ref(fig:musshm)). Moreover, one can plot multiple distinct aSVGs in a single SHM plot as shown in Figure \@ref(fig:arabshm). This is particularly useful for displaying abundance trends across multiple development stages, where each is represented by its own aSVG image. In addition to static SHM representations, one can visualize them in form of dynamic animations embedded in interactive HTML files or generate videos for them.

To maximize reusability and extensibility, the package organizes large-scale omics assay data along with the associated experimental design information in a SummarizedExperiment object (Figure \@ref(fig:illus)A). The latter is one of the core S4 classes within the Bioconductor ecosystem that has been widely adapted by many other software packages dealing with gene-, protein- and metabolite-level profiling data [@SummarizedExperiment]. In case of gene expression data, the assays slot of the SummarizedExperiment container is populated with a gene expression matrix, where the rows and columns represent the genes and tissue/conditions, respectively, while the colData slot contains sample data including replicate information. The tissues and/or cell type information in the object maps via colData to the corresponding features in the SVG images using unique identifiers for the spatial features (e.g. tissues or cell types). This allows to color the features of interest in an SVG image according to the numeric data stored in a SummarizedExperiment object. For simplicity the numeric data can also be provided as numeric vectors or data.frames. This can be useful for testing purposes and/or the usage of simple data sets that may not require the more advanced features of the SummarizedExperiment class, such as measurements with only one or a few data points. The details about how to access the SVG images and properly format the associated expression data are provided in the Supplementary Section of this vignette.

Image Format: SVG {#term}

SHMs are images where colors encode numeric values in features of any shape. For plotting SHMs, Scalable Vector Graphics (SVG) has been chosen as image format since it is a flexible and widely adapted vector graphics format that provides many advantages for computationally embedding numerical and other information in images. SVG is based on XML formatted text describing all components present in images, including lines, shapes and colors. In case of biological images suitable for SHMs, the shapes often represent anatomical or cell structures. To assign colors to specific features in SHMs, annotated SVG (aSVG) files are used where the shapes of interest are labeled according to certain conventions so that they can be addressed and colored programmatically. SVGs and aSVGs of anatomical structures can be downloaded from many sources including the repositories described below. Alternatively, users can generate them themselves with vector graphics software such as Inkscape{target="_blank"}. Typically, in aSVGs one or more shapes of a feature of interest, such as the cell shapes of an organ, are grouped together by a common feature identifier. Via these group identifiers one or many feature types can be colored simultaneously in an aSVG according to biological experiments assaying the corresponding feature types with the required spatial resolution. Correct assignment of image features and assay results is assured by using for both the same feature identifiers. The color gradient used to visually represent the numeric assay values is controlled by a color gradient parameter. To visually interpret the meaning of the colors, the corresponding color key is included in the SHM plots. Additional details for properly formatting and annotating both aSVG images and assay data are provided in the Supplementary Section section of this vignette.

Data Repositories {#data_repo}

If not generated by the user, SHMs can be generated with data downloaded from various public repositories. This includes gene, protein and metabolic profiling data from databases, such as GEO{target="_blank"}, BAR{target="_blank"} and Expression Atlas{target="_blank"} from EMBL-EBI [@Papatheodorou2018-jy]. A particularly useful resource, when working with spatialHeatmap, is the EBI Expression Atlas. This online service contains both assay and anatomical images. Its assay data include mRNA and protein profiling experiments for different species, tissues and conditions. The corresponding anatomical image collections are also provided for a wide range of species including animals and plants. In spatialHeatmap several import functions are provided to work with the expression and aSVG repository from the Expression Atlas directly. The aSVG images developed by the spatialHeatmap project are available in its own repository called spatialHeatmap aSVG Repository{target="_blank"}, where users can contribute their aSVG images that are formatted according to our guidlines.

Tutorial Overview {#sample_data}

The following sections of this vignette showcase the most important functionalities of the spatialHeatmap package using as initial example a simple to understand toy data set, and then more complex mRNA profiling data from the Expression Atlas and GEO databases. First, SHM plots are generated for both the toy and mRNA expression data. The latter include gene expression data sets from RNA-Seq and microarray experiments of Human Brain, Mouse Organs, Chicken Organs, and Arabidopsis Shoots. The first three are RNA-Seq data from the Expression Atlas{target="_blank"}, while the last one is a microarray data set from GEO{target="_blank"}. Second, gene context analysis tools are introduced, which facilitate the visualization of gene modules sharing similar expression patterns. This includes the visualization of hierarchical clustering results with traditional matrix heatmaps (Matrix Heatmap) as well co-expression network plots (Network). Third, an overview of the corresponding Shiny App is presented that provides access to the same functionalities as the R functions, but executes them in an interactive GUI environment [@shiny; @shinydashboard]. Fourth, more advanced features for plotting customized SHMs are covered using the Human Brain data set as an example.

Getting Started

Installation

The spatialHeatmap package should be installed from an R (version $\ge$ 3.6) session with the BiocManager::install command.

```{ eval=FALSE, echo=TRUE, warnings=FALSE} if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("spatialHeatmap")

## Packages and Documentation

Next, the packages required for running the sample code in this vignette need to be loaded.  

```r
library(spatialHeatmap); library(SummarizedExperiment); library(ExpressionAtlas); library(GEOquery)

The following lists the vignette(s) of this package in an HTML browser. Clicking the corresponding name will open this vignette.

browseVignettes('spatialHeatmap')

Spatial Heatmaps (SHMs) {#shm}

Toy Example {#toy}

SHMs are plotted with the spatial_hm function. To provide a quick and intuitive overview how these plots are generated, the following uses a generalized toy example where a small vector of random numeric values is generated that are used to color features in an aSVG image. The image chosen for this example is an aSVG depicting the human brain. The corresponding image file 'homo_sapiens.brain.svg' is included in this package for testing purposes. The path to this image on a user\'s system, where spatialHeatmap is installed, can be obtained with the system.file function.

aSVG Image

The following commands obtain the directory of the aSVG collection and the full path to the chosen target aSVG image on a user's system, respectively.

svg.dir <- system.file("extdata/shinyApp/example", package="spatialHeatmap")
svg.hum <- system.file("extdata/shinyApp/example", 'homo_sapiens.brain.svg', package="spatialHeatmap")

To identify feature labels of interest in annotated aSVG images, the return_feature function can be used. The following searches the aSVG images stored in dir for the query terms 'lobe' and 'homo sapiens' under the feature and species fields, respectively. The identified matches are returned as a data.frame.

feature.df <- return_feature(feature=c('lobe'), species=c('homo sapiens'), remote=FALSE, dir=svg.dir)
feature.df
fnames <- feature.df[, 1]

Numeric Data

The following example generates a small numeric toy vector, where the data slot contains four numbers and its name slot is populated with the three feature names obtained from the above aSVG image. In addition, a non-matching entry (here 'notMapped') is included for demonstration purposes. Note, the numbers are mapped to features via matching names among the numeric vector and the aSVG, respectively. Accordingly, only numbers and features with matching name counterparts can be colored in the aSVG image. Entries without name matches are indicated by a message printed to the R console, here "notMapped". This behavior can be turned off with verbose=FALSE in the corresponding function call. In addition, a summary of the numeric assay to feature mappings is stored in the result data.frame returned by the spatial_hm function (see below).

my_vec <- sample(1:100, length(unique(fnames))+1)
names(my_vec) <- c(unique(fnames), 'notMapped')
my_vec

Plot SHM

Next, the SHM is plotted with the spatial_hm function (Figure \@ref(fig:toyshm)). Internally, the numbers in my_vec are translated into colors based on the color key assigned to the col.com argument, and then painted onto the corresponding features in the aSVG, where the path to the image file is defined by svg.path=svg.hum. The remaining arguments used here include: ID for defining the title of the plot; ncol for setting the column-wise layout of the plot excluding the feature legend plot on the right; and height for defining the height of the SHM relative to its width. In the given example (Figure \@ref(fig:toyshm)) only three features in my_vec ('occipital lobe', 'parietal lobe', and 'temporal lobe') have matching entries in the corresponding aSVG.

shm.df <- spatial_hm(svg.path=svg.hum, data=my_vec, ID='toy', ncol=1, height=0.9, width=0.8, sub.title.size=20, legend.nrow=2)

The named numeric values in my_vec, that have name matches with the features in the chosen aSVG, are stored in the mapped_feature slot.

# The SHM and mapped features are stored in a list
names(shm.df)
# Mapped features
shm.df[['mapped_feature']]

Human Brain {#hum}

This subsection introduces how to find cell- and tissue-specific assay data in the Expression Atlas database. After choosing a gene expression experiment, the data is downloaded directly into a user\'s R session. Subsequently, the expression values for selected genes can be plotted onto a chosen aSVG image with or without prior preprocessing steps (e.g. normalization). For querying and downloading expression data from the Expression Atlas database, functions from the ExpressionAtlas package are used [@ebi].

Gene Expression Data

The following example searches the Expression Atlas for expression data derived from specific tissues and species of interest, here 'cerebellum' and 'Homo sapiens', respectively.

all.hum <- searchAtlasExperiments(properties="cerebellum", species="Homo sapiens")

The search result is stored in a DFrame containing r nrow(all.hum) accessions matching the above query. For the following sample code, the accession 'E-GEOD-67196{target="blank"}' from Prudencio _et al. [-@Prudencio2015-wd] has been chosen, which corresponds to an RNA-Seq profiling experiment of 'cerebellum' and 'frontal cortex' brain tissue from patients with amyotrophic lateral sclerosis (ALS). Details about the corresponding record can be returned as follows.

all.hum[2, ]

The getAtlasData function allows to download the chosen RNA-Seq experiment from the Expression Atlas and import it into a RangedSummarizedExperiment object of a user\'s R session.

rse.hum <- getAtlasData('E-GEOD-67196')[[1]][[1]]

The design of the downloaded RNA-Seq experiment is described in the colData slot of rse.hum. The following returns only its first five rows and columns.

colData(rse.hum)[1:5, 1:5]

aSVG Image

The following example shows how to download from the above described SVG repositories an aSVG image that matches the tissues and species assayed in the gene expression data set downloaded in the previous subsection. The return_feature function queries the repository for feature- and species-related keywords, here c('frontal cortex', 'cerebellum') and c('homo sapiens', 'brain'), respectively. To return matching aSVGs, the argument keywords.any is set to TRUE by default. When return.all=FALSE, only aSVGs matching the query keywords are returned and saved under dir. Otherwise, all aSVGs are returned regardless of the keywords. To avoid overwriting of existing SVG files, it is recommended to start with an empty target directory, here tmp.dir. To search a local directory for matching aSVG images, the argument setting remote=FALSE needs to be used, while specifying the path of the corresponding directory under dir. All or only matching features are returned if match.only is set to FALSE or TRUE, respectively.

tmp.dir <- paste0(normalizePath(tempdir(check=TRUE), winslash="/", mustWork=FALSE), '/shm')  # Create empty directory
feature.df <- return_feature(feature=c('frontal cortex', 'cerebellum'), species=c('homo sapiens', 'brain'), keywords.any=TRUE, return.all=FALSE, dir=tmp.dir, remote=TRUE, match.only=TRUE, desc=FALSE) # Query aSVGs
feature.df[1:8, ] # Return first 8 rows for checking
unique(feature.df$SVG) # Return all matching aSVGs

To build this vignettes according to the R/Bioconductor package requirements, the following code section uses the aSVG file instance included in the spatialHeatmap package rather than the downloaded instance from the previous example.

feature.df <- return_feature(feature=c('frontal cortex', 'cerebellum'), species=c('homo sapiens', 'brain'), keywords.any=TRUE, return.all=FALSE, dir=svg.dir, remote=FALSE)

Note, the target tissues frontal cortex and cerebellum are included in both the experimental design slot of the downloaded expression data as well as the annotations of the aSVG. This way these features can be colored in the downstream SHM plots. If necessary users can also change from within R the feature identifiers and names in an aSVG. Details on this utility are provided in the Supplementary Section.

feature.df

Since the Expression Atlas supports the cross-species anatomy ontology{target="_blank"}, the corresponding UBERON identifiers are included in the id column of the data.frame returned by the above function call of return_feature [@Mungall2012-ma]. This ontology is also supported by the rols Bioconductor package [@rols].

Experimental Design

For organizing experimental designs and downstream plotting purposes, it can be desirable to customize the text in certain columns of colData. This way one can use the source data for displaying 'pretty' sample names in columns and legends of all downstream tables and plots, respectively, in a consistent and automated manner. To achieve this, the following example imports a 'targets' file that can be generated and edited by the user in a text or spreadsheet program. In the following example the target file content is used to replace the text in the colData slot of the RangedSummarizedExperiment object with a version containing shorter sample names that are more suitable for plotting purposes.

The following imports a custom target file containing simplified sample labels and experimental design information.

hum.tar <- system.file('extdata/shinyApp/example/target_human.txt', package='spatialHeatmap')
target.hum <- read.table(hum.tar, header=TRUE, row.names=1, sep='\t')

Load custom target data into colData slot.

colData(rse.hum) <- DataFrame(target.hum)

A slice of the simplified colData object is shown below, where the disease column contains now shorter labels than in the original data set. Additional details for generating and using target files in spatialHeatmap are provided in the Supplementary Section of this vignette.

colData(rse.hum)[c(1:3, 41:42), 4:5]

Preprocess Assay Data

The actual gene expression data of the downloaded RNA-Seq experiment is stored in the assay slot of rse.hum. Since it contains raw count data, it can be desirable to apply basic preprocessing routines prior to plotting spatial heatmaps. The following shows how to normalize the count data, aggregate replicates and then remove genes with unreliable expression responses. These preprocessing steps are optional and can be skipped if needed. For this, the expression data can be provided to the spatial_hm function directly, where it is important to assign to the sam.factor and con.factor arguments the corresponding sample and condition column names (Table \@ref(tab:arg)).

For normalizing raw count data from RNA-Seq experiments, the norm_data function can be used. It supports the following pre-existing functions from widely used packages for analyzing count data in the next generation sequencing (NGS) field: calcNormFactors (CNF) from edgeR [@edgeR1]; as well as estimateSizeFactors (ESF), varianceStabilizingTransformation (VST), and rlog from DESeq2 [@DESeq2]. The argument norm.fun specifies one of the four internal normalizing methods: CNF, ESF, VST, and rlog. If norm.fun='none', no normalization is applied. The arguments for each normalizing function are provided via a parameter.list, which is a list with named slots. For example, norm.fun='ESF' and parameter.list=list(type='ratio') is equivalent to estimateSizeFactors(object, type='ratio'). If paramter.list=NULL, the default arguments are used by the normalizing function assigned to norm.fun. For additional details, users want to consult the help file of the norm_data function by typing ?norm_data in the R console.

The following example uses the ESF normalization option. This method has been chosen mainly due to its good time performance.

se.nor.hum <- norm_data(data=rse.hum, norm.fun='ESF', data.trans='log2')

Replicates are aggregated with the aggr_rep function, where the summary statistics can be chosen under the aggr argument (e.g. aggr='mean'). The columns specifying replicates can be assigned to the sam.factor and con.factor arguments corresponding to samples and conditions, respectively. For tracking, the corresponding sample/condition labels are used as column titles in the aggregated assay instance, where they are concatenated with a double underscore as separator. In addition, the corresponding rows in the colData slot are collapsed accordingly.

se.aggr.hum <- aggr_rep(data=se.nor.hum, sam.factor='organism_part', con.factor='disease', aggr='mean')
assay(se.aggr.hum)[1:3, ]

To remove unreliable expression measures, filtering can be applied. The following example retains genes with expression values larger than 5 (log2 space) in at least 1% of all samples (pOA=c(0.01, 5)), and a coefficient of variance (CV) between 0.30 and 100 (CV=c(0.30, 100)).

se.fil.hum <- filter_data(data=se.aggr.hum, sam.factor='organism_part', con.factor='disease', pOA=c(0.01, 5), CV=c(0.3, 100), dir=NULL)

To inspect the results, the following returns three selected rows of the fully preprocessed data matrix (Table \@ref(tab:humtab)).

assay(se.fil.hum)[c(5, 733:734), ]

kable(assay(se.fil.hum)[c(5, 733:734), ], caption='Slice of fully preprocessed expression matrix.')

SHM: Single Gene

The preprocessed expression values for any gene in the assay slot of se.fil.hum can be plotted as an SHM. The following uses gene ENSG00000268433 as an example. The chosen aSVG is a depiction of the human brain where the assayed featured are colored by the corresponding expression values in se.fil.hum.

shm.df <- spatial_hm(svg.path=svg.hum, data=se.fil.hum, ID=c('ENSG00000268433'), height=0.7, legend.r=1.5, legend.key.size=0.02, legend.text.size=12, legend.nrow=2)

The plotting instructions of the SHM along with the corresponding mapped features are stored as a list, here named shm.df. Its components can be accessed as follows.

names(shm.df)
# Mapped features
shm.df[['mapped_feature']]

In the above example, the normalized expression values of gene ENSG00000268433 are colored in the frontal cortex and cerebellum, where the different conditions, here normal and ALS, are given in separate SHMs plotted next to each other. The color and feature mappings are defined by the corresponding color key and legend plot on the left and right, respectively.

SHM: Multiple Genes

SHMs for multiple genes can be plotted by providing the corresponding gene IDs under the ID argument as a character vector. The spatial_hm function will then sequentially arrange the SHMs for each gene in a single composite plot. To facilitate comparisons among expression values across genes and/or conditions, the lay.shm parameter can be assigned 'gene' or 'con', respectively. For instance, in Figure \@ref(fig:mul) the SHMs of the genes ENSG00000268433 and ENSG00000006047 are organized by condition in a horizontal view. This functionality is particularly useful when comparing gene families. Users can also customize the order of the SHM subplots, by assigning lay.shm='none'. With this setting the SHM subplots are organized according to the gene and condition ordering under ID and data, respectively.

spatial_hm(svg.path=svg.hum, data=se.fil.hum, ID=c('ENSG00000268433', 'ENSG00000006047'), lay.shm='con', width=0.8, height=1, legend.r=1.5, legend.nrow=2)

SHM: HTML and Video

SHMs can be saved to interactive HTML files as well as video files. To trigger this export behavior, the argument out.dir needs to be assinged a directory path where the HTML and video files will be stored. Each HTML file contains an interactive SHM with zoom in and out functionality. Hovering over graphics features will display data, gene, condition and other information. The video will play the SHM subplots in the order specified under the lay.shm argument.

The following example saves the interactive HTML and video files under
a directory named tmp.dir.

tmp.dir <- paste0(normalizePath(tempdir(check=TRUE), winslash="/"), '/shm')
spatial_hm(svg.path=svg.hum, data=se.fil.hum, ID=c('ENSG00000268433', 'ENSG00000006047'), lay.shm='con', width=0.8, height=1, legend.r=1.5, legend.nrow=2, out.dir=tmp.dir)

SHM: Customization

To provide a high level of flexibility, the spatial_hm contains many arguments. An overview of important arguments and their utility is provided in Table \@ref(tab:arg).

arg.df <- read.table('file/spatial_hm_arg.txt', header=TRUE, row.names=1, sep='\t')
kable((arg.df), escape=TRUE, caption="List of important argumnets of \'spatial_hm\'.")

Mouse Organs {#mus}

This section generates an SHM plot for mouse data from the Expression Atlas. The code components are very similar to the previous Human Brain example. For brevity, the corresponding text explaining the code has been reduced to a minimum.

Gene Expression Data

The chosen mouse RNA-Seq data compares tissue level gene expression across mammalian species [@Merkin2012-ak]. The following searches the Expression Atlas for expression data from 'heart' and 'Mus musculus'.

all.mus <- searchAtlasExperiments(properties="heart", species="Mus musculus")

Among the many matching entries, accession 'E-MTAB-2801' will be downloaded.

all.mus[7, ]
rse.mus <- getAtlasData('E-MTAB-2801')[[1]][[1]]

The design of the downloaded RNA-Seq experiment is described in the colData slot of rse.mus. The following returns only its first three rows.

colData(rse.mus)[1:3, ]

aSVG Image

tmp.dir <- paste0(normalizePath(tempdir(check=TRUE), winslash="/", mustWork=FALSE), '/shm')
feature.df <- return_feature(feature=c('heart', 'kidney'), species=c('Mus musculus'), keywords.any=TRUE, return.all=FALSE, dir=tmp.dir, remote=TRUE, match.only=FALSE)

feature.df <- return_feature(feature=c('heart', 'kidney'), species=NULL, keywords.any=TRUE, return.all=FALSE, dir=svg.dir, remote=FALSE, match.only=FALSE)

Return the names of the matching aSVG files.

unique(feature.df$SVG)

The following first selects mus_musculus.male.svg as target aSVG, then returns the first three rows of the resulting feature.df, and finally prints the unique set of all aSVG features.

feature.df <- subset(feature.df, SVG=='mus_musculus.male.svg')
feature.df[1:3, ]
unique(feature.df[, 1])

Obtain path of target aSVG on user system.

svg.mus <- system.file("extdata/shinyApp/example", "mus_musculus.male.svg", package="spatialHeatmap")

Experimental Design

The following imports a sample target file that is included in this package. To inspect its content, the first three rows of the target file are printed to the screen.

mus.tar <- system.file('extdata/shinyApp/example/target_mouse.txt', package='spatialHeatmap')
target.mus <- read.table(mus.tar, header=TRUE, row.names=1, sep='\t')
target.mus[1:3, ]
unique(target.mus[, 3])

Load custom target data into colData slot.

colData(rse.mus) <- DataFrame(target.mus)

Preprocess Assay Data

The raw RNA-Seq count are preprocessed with the following steps: (1) normalization, (2) aggregation of replicates, and (3) filtering of reliable expression data. The details of these steps are explained in the sub-section above using data from human.

se.nor.mus <- norm_data(data=rse.mus, norm.fun='ESF', data.trans='log2') # Normalization
se.aggr.mus <- aggr_rep(data=se.nor.mus, sam.factor='organism_part', con.factor='strain', aggr='mean') # Aggregation of replicates
se.fil.mus <- filter_data(data=se.aggr.mus, sam.factor='organism_part', con.factor='strain', pOA=c(0.01, 5), CV=c(0.6, 100), dir=NULL) # Filtering of genes with low counts and variance

SHM: Transparency

The pre-processed expression data for gene 'ENSMUSG00000000263' is plotted in form of an SHM. In this case the plot includes expression data for 8 tissues across 3 mouse strains.

spatial_hm(svg.path=svg.mus, data=se.fil.mus, ID=c('ENSMUSG00000000263'), height=0.7, legend.width=0.7, legend.text.size=10, sub.title.size=9, ncol=3, tis.trans=c('skeletal muscle'), legend.nrow=4, line.size=0.2, line.color='grey70')

The SHM plots in Figures \@ref(fig:musshm) and below demonstrate the usage of the transparency feature via the tis.trans parameter. The corresponding mouse organ aSVG image includes overlapping tissue layers. In this case the skelectal muscle layer partially overlaps with lung and heart tissues. To view lung and heart in Figure \@ref(fig:musshm), the skelectal muscle tissue is set transparent with tis.trans=c('skeletal muscle'). To view in the same aSVG the skeletal muscle tissue instead, tis.trans is assigned NULL as shown below.

To fine control the visual effects in feature rich aSVGs, the line.size and line.color parameters are useful. This way one can adjust the thickness and color of complex structures.

spatial_hm(svg.path=svg.mus, data=se.fil.mus, ID=c('ENSMUSG00000000263'), height=0.6, legend.text.size=10, sub.title.size=9, ncol=3, tis.trans=NULL, legend.ncol=2, line.size=0.2, line.color='grey70')

Chicken Organs {#chk}

This section generates an SHM plot for chicken data from the Expression Atlas. The code components are very similar to the Human Brain example. For brevity, the corresponding text explaining the code has been reduced to a minimum.

Gene Expression Data

The chosen chicken RNA-Seq experiment compares the developmental changes across nine time points of seven organs [@Cardoso-Moreira2019-yq].

The following searches the Expression Atlas for expression data from 'heart' and 'gallus'.

all.chk <- searchAtlasExperiments(properties="heart", species="gallus")

Among the matching entries, accession 'E-MTAB-6769' will be downloaded.

all.chk[3, ]
rse.chk <- getAtlasData('E-MTAB-6769')[[1]][[1]]

The design of the downloaded RNA-Seq experiment is described in the colData slot of rse.chk. The following returns only its first three rows.

colData(rse.chk)[1:3, ]

aSVG Image

The following example shows how to download from the above introduced SVG repositories an aSVG image that matches the tissues and species assayed in the gene expression data set downloaded in the previous subsection. As before the image is saved to a directory named tmp.dir.

tmp.dir <- paste0(normalizePath(tempdir(check=TRUE), winslash="/", mustWork=FALSE), '/shm')
# Query aSVGs.
feature.df <- return_feature(feature=c('heart', 'kidney'), species=c('gallus'), keywords.any=TRUE, return.all=FALSE, dir=tmp.dir, remote=TRUE, match.only=FALSE)

feature.df <- return_feature(feature=c('heart', 'kidney'), species=c('gallus'), keywords.any=TRUE, return.all=FALSE, dir=svg.dir, remote=FALSE, match.only=FALSE)
feature.df

Obtain path of target aSVG on user system.

svg.chk <- system.file("extdata/shinyApp/example", "gallus_gallus.svg", package="spatialHeatmap")

Experimental Design

The following imports a sample target file that is included in this package. To inspect its content, the first three rows of the target file are printed to the screen.

chk.tar <- system.file('extdata/shinyApp/example/target_chicken.txt', package='spatialHeatmap')
target.chk <- read.table(chk.tar, header=TRUE, row.names=1, sep='\t')
target.chk[1:3, ]

Load custom target data into colData slot.

colData(rse.chk) <- DataFrame(target.chk)

Return samples used for plotting SHMs.

unique(colData(rse.chk)[, 'organism_part'])

Return conditions considered for plotting downstream SHM.

unique(colData(rse.chk)[, 'age'])

Preprocess Assay Data

se.nor.chk <- norm_data(data=rse.chk, norm.fun='ESF', data.trans='log2') # Normalization
se.aggr.chk <- aggr_rep(data=se.nor.chk, sam.factor='organism_part', con.factor='age', aggr='mean') # Replicate agggregation using mean 
se.fil.chk <- filter_data(data=se.aggr.chk, sam.factor='organism_part', con.factor='age', pOA=c(0.01, 5), CV=c(0.6, 100), dir=NULL) # Filtering of genes with low counts and varince

SHM: Time Series

The expression profile for gene ENSGALG00000006346 is plotted across nine time points in four organs in form of a composite SHM with 9 panels. Their layout in three columns is controlled with the argument setting ncol=3.

spatial_hm(svg.path=svg.chk, data=se.fil.chk, ID='ENSGALG00000006346', width=0.9, legend.width=0.9, legend.r=1.5, sub.title.size=9, ncol=3, legend.nrow=2, label=TRUE)

Arabidopsis Shoot {#shoot}

This section generates an SHM for Arabidopsis thaliana tissues with gene expression data from the Affymetrix microarray technology. The chosen experiment used ribosome-associated mRNAs from several cell populations of shoots and roots that were exposed to hypoxia stress [@Mustroph2009-nu]. In this case the expression data will be downloaded from GEO{target="_blank"} with utilites from the GEOquery package [@geo]. The data preprocessing routines are specific to the Affymetrix technology. The remaining code components for generating SHMs are very similar to the previous examples. For brevity, the text in this section explains mainly the steps that are specific to this data set.

Gene Expression Data

The GSE14502 data set will be downloaded with the getGEO function from the GEOquery package. Intermediately, the expression data is stored in an ExpressionSet container [@biobase], and then converted to a SummarizedExperiment object.

gset <- getGEO("GSE14502", GSEMatrix=TRUE, getGPL=TRUE)[[1]]
se.sh <- as(gset, "SummarizedExperiment")

The gene symbol identifiers are extracted from the rowData component to be used as row names. Similarly, one can work with AGI identifiers by providing below AGI under Gene.Symbol.

rownames(se.sh) <- make.names(rowData(se.sh)[, 'Gene.Symbol'])

The following returns a slice of the experimental design stored in the colData slot. Both the samples and conditions are contained in the title column. The samples include promoters (pGL2, pCO2, pSCR, pWOL, p35S), tissues and organs (root atrichoblast epidermis, root cortex meristematic zone, root endodermis, root vasculature, root_total and shoot_total); and the conditions are control and hypoxia.

colData(se.sh)[60:63, 1:4]

aSVG Image

In this example, the aSVG image has been generated in Inkscape from the corresponding figure in @Mustroph2009-nu. The resulting custom figure has been included as a sample aSVG file in the spatialHeatmap package. Detailed instructions for generating custom aSVG images in Inkscape are provided in the SVG tutorial{target="_blank"}.

The annotations in the corresponding aSVG file located under svg.dir can be queried with the return_features function.

feature.df <- return_feature(feature=c('pGL2', 'pRBCS'), species=c('shoot'), keywords.any=TRUE, return.all=FALSE, dir=svg.dir, remote=FALSE, match.only=FALSE)

The unique set of the matching aSVG files can be returned as follows.

unique(feature.df$SVG)

The aSVG file arabidopsis_thaliana.shoot_shm.svg is chosen to generate the SHM in this section.

feature.df <- subset(feature.df, SVG=='arabidopsis_thaliana.shoot_shm.svg')
feature.df[1:3, ]

Obtain full path of target aSVG on user system.

svg.sh <- system.file("extdata/shinyApp/example", "arabidopsis_thaliana.shoot_shm.svg", package="spatialHeatmap")

Experimental Design

The following imports a sample target file that is included in this package. To inspect its content, four selected rows of this target file are printed to the screen.

sh.tar <- system.file('extdata/shinyApp/example/target_arab.txt', package='spatialHeatmap')
target.sh <- read.table(sh.tar, header=TRUE, row.names=1, sep='\t')
target.sh[60:63, ]

Return all samples present in target file.

unique(target.sh[, 'sample'])

Return all conditions present in target file.

unique(target.sh[, 'condition'])

Load custom target data into colData slot.

colData(se.sh) <- DataFrame(target.sh)

Preprocess Assay Data

The downloaded GSE14502 data set has already been normalized with the RMA algorithm [@affy]. Thus, the pre-processing steps can be restricted to the aggregation of replicates and filtering of reliably expressed genes. For the latter, the following code will retain genes with expression values larger than 6 (log2 space) in at least 3% of all samples (pOA=c(0.03, 6)), and with a coefficient of variance (CV) between 0.30 and 100 (CV=c(0.30, 100)).

se.aggr.sh <- aggr_rep(data=se.sh, sam.factor='sample', con.factor='condition', aggr='mean') # Replicate agggregation using mean
se.fil.arab <- filter_data(data=se.aggr.sh, sam.factor='sample', con.factor='condition', pOA=c(0.03, 6), CV=c(0.30, 100), dir=NULL) # Filtering of genes with low intensities and variance

SHM: Microarray

The expression profile for the HRE2 gene is plotted for the control and the hypoxia treatment across six cell types (Figure \@ref(fig:shshm)).

spatial_hm(svg.path=svg.sh, data=se.fil.arab, ID=c("HRE2"), height=0.7, legend.nrow=3, legend.text.size=11)

SHM: Development {#mul_svg}

The spatial_hm function can arrange multiple aSVGs in a single SHM plot, such as aSVGs from different development stages. To organize the subplots, the names of the aSVG files are expected to include the following suffixes: *_shm1.svg, *_shm2.svg, etc. The paths to the aSVG files are provided under the svg.path argument. By default, every aSVG image will have a legend plot on the right. The legend argument provides fine control over which legend plots to display.

As a simple toy example, the following stores random numbers in a data.frame.

df.random <- data.frame(matrix(sample(x=1:100, size=50, replace=TRUE), nrow=10))
colnames(df.random) <- c('shoot_totalA__condition1', 'shoot_totalA__condition2', 'shoot_totalB__condition1', 'shoot_totalB__condition2', 'notMapped') # Assign column names
rownames(df.random) <- paste0('gene', 1:10) # Assign row names 
df.random[1:3, ]

Next the paths to the aSVG files are obtained, here for younger and older plants using *_shm1 and *_shm1, respectively, which are made from @Mustroph2009-nu.

svg.sh1 <- system.file("extdata/shinyApp/example", "arabidopsis_thaliana.organ_shm1.svg", package="spatialHeatmap")
svg.sh2 <- system.file("extdata/shinyApp/example", "arabidopsis_thaliana.organ_shm2.svg", package="spatialHeatmap")

The following generates the corresponding SHMs plot for gene1. The orginal image dimensions can be preserved by assigning TRUE to the preserve.scale argument.

spatial_hm(svg.path=c(svg.sh1, svg.sh2), data=df.random, ID=c('gene1'), width=0.7, legend.r=0.9, legend.width=1, preserve.scale=TRUE)

Matrix Heatmaps {#mhm}

SHMs are suitable for comparing assay profiles among small number of items (e.g. few genes or proteins) across cell types and conditions. To also support analysis routines of larger number of items, spatialHeatmap integrates functionalities for identifying groups of items with similar and/or dissimilar assay profiles, and subsequently analyzing the results with data mining methods methods that scale to larger numbers of items than SHMs, such as hierarchical clustering and network analysis methods. The following introduces both approaches using as sample data the gene expression data from Arabidopsis shoots from the previous section.

To identify similar items, the submatrix function can be used. The latter identifies items sharing similar profiles with one or more query items of interest. The given example uses correlation coefficients as similarity metric. Three filtering parameters are provided to control the similarity and number of items in the result. First, the p argument allows to restrict the number of similar items to return based on a percentage cutoff relative to the number of items in the assay data set (e.g. 1% of the top most similar items). Second, a fixed number n of the most similar items can be returned. Third, all items above a user definable correlation coefficient cutoff value can be returned, such as a Pearson correlation coefficient (PCC) of at least 0.6. If several query items are provided then the function returns the similar genes for each query, while assuring uniqueness among items in the result.

The following example uses as query the two genes: 'RCA' and 'HRE2'. The ann argument corresponds to the column name in the rowData slot containing gene annotation information. The latter is only relevant if users want to see these annotations when mousing over a node in the interactive network plot generated in the next section.

sub.mat <- submatrix(data=se.fil.arab, ann='Target.Description', ID=c('RCA', 'HRE2'), p=0.1)

The result from the previous step is the assay matrix subsetted to the genes sharing similar assay profiles with the query genes 'RCA' and 'HRE2'.

sub.mat[c('RCA', 'HRE2'), c(1:3, 37)] # Subsetted assay matrix

Subsequently, hierarchical clustering is applied to the subsetted assay matrix containing only the genes that share profile similarities with the query genes 'RCA' and 'HRE2'. The clustering result is displayed as a matrix heatmap where the rows and columns are sorted by the corresponding hierarchical clustering dendrograms (Figure \@ref(fig:static)). The position of the query genes ('RCA' and 'HRE2') is indicated in the heatmap by black lines. Setting static=FALSE will launch the interactive mode, where users can zoom into the heatmap by selecting subsections in the image or zoom out by double clicking.

matrix_hm(ID=c('RCA', 'HRE2'), data=sub.mat, angleCol=80, angleRow=35, cexRow=0.8, cexCol=0.8, margin=c(10, 6), static=TRUE, arg.lis1=list(offsetRow=0.01, offsetCol=0.01))

Network Graphs {#net}

Module Identification

Network analysis is performed with the WGCNA algorithm [@Langfelder2008-sg; @Ravasz2002-db] using as input the subsetted assay matrix generated in the previous section. The objective is to identify network modules that can be visualized in the following step in form of network graphs. Applied to the gene expression sample data used here, these network modules represent groups of genes sharing highly similar expression profiles. Internally, the network module identification includes five major steps. First, a correlation matrix is computed using distance or correlation-based similarity metrics. Second, the obtained matrix is transformed into an adjacency matrix defining the connections among items. Third, the adjacency matrix is used to calculate a topological overlap matrix (TOM) where shared neighborhood information among items is used to preserve robust connections, while removing spurious connections. Fourth, the distance transformed TOM is used for hierarchical clustering. To maximize time performance, the hierarchical clustering is performed with the flashClust package [@Langfelder2012-nv]. Fifth, network modules are identified with the dynamicTreeCut package [@dynamicTreeCut]. Its ds (deepSplit) argument can be assigned integer values from 0 to 3, where higher values increase the stringency of the module identification process. To simplify the network module idenification process with WGCNA, the individual steps can be executed with a single function called adj_mod. The result is a list containing the adjacency matrix and the final module assignments stored in a data.frame. Since the interactive network feature used in the visualization step below performs best on smaller modules, only modules are returned that were obtained with stringent ds settings (here ds=2 and ds=3).

adj.mod <- adj_mod(data=sub.mat)

A slice of the adjacency matrix is shown below.

adj.mod[['adj']][1:3, 1:3]

The module assignments are stored in a data frame. Its columns contain the results for the ds=2 and ds=3 settings. Integer values $>0$ are the module labels, while $0$ indicates unassigned items. The following returns the first three rows of the module assignment result.

adj.mod[['mod']][1:3, ]

Module Visualization

Network modules can be visualized with the network function. To plot a module containing an item (gene) of interest, its ID (e.g. 'HRE2') needs to be provided under the corresponding argument. Typically, this could be one of the items chosen for the above SHM plots. There are two modes to visualize the selected module: static or interactive. Figure \@ref(fig:inter) was generated with static=TRUE. Setting static=FALSE will generate the interactive version. In the network plot shown below the nodes and edges represent genes and their interactions, respectively. The thickness of the edges denotes the adjacency levels, while the size of the nodes indicates the degree of connectivity of each item in the network. The adjacency and connectivity levels are also indicated by colors. For example, in Figure \@ref(fig:inter) connectivity increases from "turquoise" to "violet", while the adjacency increases from "yellow" to "blue". If a gene of interest has been assigned under ID, then its ID is labeled in the plot with the suffix tag: '*_target'.

network(ID="HRE2", data=sub.mat, adj.mod=adj.mod, adj.min=0.90, vertex.label.cex=1.2, vertex.cex=2, static=TRUE)

Setting static=FALSE launches the interactive network. In this mode there is an interactive color bar that denotes the gene connectivity. To modify it, the color lables need to be provided in a comma separated format, e.g.: yellow, orange, red. The latter would indicate that the gene connectivity increases from yellow to red.

If the subsetted expression matrix contains gene/protein annotation information in the last column, then it will be shown when moving the cursor over a network node.

network(ID="HRE2", data=sub.mat,  adj.mod=adj.mod, static=FALSE)

Shiny App {#shiny}

In additon to running spatialHeatmap from R, the package includes a Shiny App that provides access to the same functionalities from an intuitive-to-use web browser interface. Apart from being very user-friendly, this App conveniently organizes the results of the entire visualization workflow in a single browser window with options to adjust the parameters of the individual components interactively. For instance, genes can be selected and replotted in the SHM simply by clicking the corresponding rows in the expression table included in the same window. This representation is very efficient in guiding the interpretation of the results in a visual and user-friendly manner. For testing purposes, the spatialHeatmap Shiny App also includes ready-to-use sample expression data and aSVG images along with embedded user instructions.

Local System

The Shiny App of spatialHeatmap can be launched from an R session with the following function call.

shiny_all()

The dashboard panels of the Shiny App are organized as follows:

Left Side Panel: menu for instructions, and selecting input files and parameters
Data Matrix: scrollable assay matrix uploaded by users
Spatial Heatmap: spatially colored images based on aSVG images selected and/or uploaded by user
Matrix Heatmap: hierarchical clustering results of user selected items along with nearest neighbors showing most similar abundance profiles
Network Graph: interactive network graph for selected module
Parameter Control Menus: postioned on top of each visualization panel

A screenshot is shown below depicting an SHM plot generated with the Shiny App of spatialHeatmap (Figure \@ref(fig:shiny)).

include_graphics('img/shiny.png')

After launching, the Shiny App displays by default one of the included data sets.

The assay data and aSVG images are uploaded to the Shiny App as tabular files (e.g. in CSV or TSV format) and SVG files, respectively. To also allow users to upload gene expression data stored in SummarizedExperiment objects, one can export it from R to a tabular file with the filter_data function, where the user specifies the directory path under the dir argument. This will create in the target directory a tablular file named customData.txt in TSV format. The column names in this file preserve the experimental design information from the colData slot by concatenating the corresponding sample and condition information separated by double underscores. An example of this format is shown in Table \@ref(tab:humtab).

se.fil.arab <- filter_data(data=se.aggr.sh, ann="Target.Description", sam.factor='sample', con.factor='condition', pOA=c(0.03, 6), CV=c(0.30, 100), dir='./')

To interactively access gene-, transcript- or protein-level annotations in the plots and tables of the Shiny App, such as viewing functional descriptions by moving the cursor over network nodes, the corresponding annotation column needs to be present in the rowData slot and its column name assigned to the ann argument. In the exported tabular file, the extra annotation column is appended to the expression matrix.

Server Deployment

As most Shiny Apps, spatialHeatmap can be deployed as a centralized web service. A major advantage of a web server deployment is that the functionalities can be accessed remotely by anyone on the internet without the need to use R on the user system. For deployment one can use custom web servers or cloud services, such as AWS, GCP or shinysapps.io{target="_blank"}. An example web instance for testing spatialHeatmap online is available here{target="_blank"}.

Custom Shiny App

The spatialHeatmap package also allows users to create customized Shiny App instances using the custom_shiny function. This function provides options to include custom example data and aSVGs, and define default values within most visualization panels (e.g. color schemes, image dimensions). For details users want to consult the help file of the custom_shiny function. To maximize flexibilty, the default parameters are stored in a yaml file on the Shiny App. This makes it easy to refine and optimize default parameters simply by changing this yaml file.

Supplementary Section {#sup}

To generate SHMs with custom data, proper formatting of the numeric (assay) data and aSVG images is required. A tabular target file describing the numeric data can also be provided but is optional. This section provides additional details on these three input types.

Numeric Data {#data_form}

The numceric data used to color the features in aSVG images can be provided as three different object types including vector, data.frame and SummerizedExperiment. When working with complex omics-based assay data then the latter provides the most flexibility, and thus should be the preferred container class for managing numeric data in spatialHeatmap. Both data.frame and SummarizedExperiment can hold data from many measured items, such as many genes or proteins. In contrast to this, the vector class is only suitable for data from single items. Due to its simplicity this less complex container is often useful for testing or when dealing with simple data sets.

Object Types

`vector`

When using numeric vectors as input to spatial_hm, then their name slot needs to be populated with strings matching the feature names in the corresponding aSVG. To also specify conditions, their labels need to be appended to the feature names with double underscores as separator, i.e. 'feature__condition'.

The following example replots the toy example for two spatial features ('occipital lobe' and 'parietal lobe') and two conditions ('1' and '2').

vec <- sample(x=1:100, size=5) # Random numeric values
names(vec) <- c('occipital lobe__condition1', 'occipital lobe__condition2', 'parietal lobe__condition1', 'parietal lobe__condition2', 'notMapped') # Assign unique names to random values
vec

With this configuration the resulting plot contains two spatial heatmap plots for the human brain, one for 'condition 1' and another one for 'contition 2'. To keep the build time and storage size of this package to a minimum, the spatial_hm function call in the code block below is not evaluated. Thus, the corresponding SHM is not shown in this vignette.

spatial_hm(svg.path=svg.hum, data=vec, ID='toy', ncol=1, legend.r=1.2, sub.title.size=14)

`data.frame`

Compared to the above vector input, data.frames are structured here like row-wise appended vectors, where the name slot information in the vectors is stored in the column names. Each row also contains a name that corresponds to the corresponding item name, such as a gene ID. The naming of spatial features and conditions in the column names follows the same conventions as the naming used for the name slots in the above vector example.

The following illustrates this with an example where a numeric data.frame with random numbers is generated containing 20 rows and 5 columns. To avoid name clashes, the values in the rows and columns should be unique.

df.test <- data.frame(matrix(sample(x=1:1000, size=100), nrow=20)) # Create numeric data.frame
colnames(df.test) <- names(vec) # Assign column names
rownames(df.test) <- paste0('gene', 1:20) # Assign row names
df.test[1:3, ]

With the resulting data.frame one can generate the same SHM as in the previous example, that used a named vector as input to the spatial_hm function. Additionally, one can now select each row by its name (here gene ID) under the ID argument.

spatial_hm(svg.path=svg.hum, data=df.test, ID=c('gene1'), ncol=1, legend.r=1.2, sub.title.size=14)

Additional information can be appended to the data.frame column-wise, such as annotation data including gene descriptions. This information can then be displayed interactively in the network plots of the Shiny App by placing the curser over network nodes.

df.test$ann <- paste0('ann', 1:20)
df.test[1:3, ]

`SummarizedExperiment`

The SummarizedExperiment class is a much more extensible and flexible container for providing metadata for both rows and columns of numeric data stored in tabular format.

To import experimental design information from tabular files, users can provide a target file that will be stored in the colData slot of the SummarizedExperiment (SE, @SummarizedExperiment) object. In other words, the target file provides the metadata for the columns of the numeric assay data. Usually, the target file contains at least two columns: one for the features/samples and one for the conditions. Replicates are indicated by identical entries in these columns. The actual numeric matrix representing the assay data is stored in the assay slot, where the rows correspond to items, such as gene IDs. Optionally, additional annotation information for the rows (e.g. gene descriptions) can be stored in the rowData slot.

For constructing a valid SummarizedExperiment object, that can be used by the spatial_hm function, the target file should meet the following requirements.

It should be imported with read.table or read.delim into a data.frame or the data.frame can be constructed in R on the fly (as shown below).
It should contain at least two columns. One column represents the features/samples and the other one the conditions. The rows in the target file correspond to the columns of the numeric data stored in the assay slot. If the condition column is empty, then the same condition is assumed under the corresponding features/samples entry.
The feature/sample names must have matching entries in to corresponding aSVG to be considered in the final SHM. Note, the double underscore is a special string that is reserved for specific purposes in spatialHeatmap, and thus should be avoided for naming feature/samples and conditions.

The following example illustrates the design of a valid SummarizedExperiment object for generating SHMs. In this example, the 'occipital lobe' tissue has 2 conditions and each condition has 2 replicates. Thus, there are 4 assays for occipital lobe, and the same design applies to the parietal lobe tissue.

sample <- c(rep('occipital lobe', 4), rep('parietal lobe', 4))
condition <- rep(c('condition1', 'condition1', 'condition2', 'condition2'), 2)
target.test <- data.frame(sample=sample, condition=condition, row.names=paste0('assay', 1:8))
target.test

The assay slot is populated with a 8 x 20 data.frame containing random numbers. Each column corresponds to an assay in the target file (here imported into colData), while each row corresponds to a gene.

df.se <- data.frame(matrix(sample(x=1:1000, size=160), nrow=20))
rownames(df.se) <- paste0('gene', 1:20)
colnames(df.se) <- row.names(target.test)
df.se[1:3, ]

Next, the final SummarizedExperiment object is constructed by providing the numeric and target data under the assays and colData arguments, respectively.

se <- SummarizedExperiment(assays=df.se, colData=target.test)
se

If needed row-wise annotation information (e.g. for genes) can be included in the SummarizedExperiment object as well. This can be done during the construction of the initial object, or as below by injecting the information into an existing SummarizedExperiment object.

rowData(se) <- df.test['ann']

In this simple example, possible normalization and filtering steps are skipped. Yet, the aggregation of replicates is performed as shown below.

se.aggr <- aggr_rep(data=se, sam.factor='sample', con.factor='condition', aggr='mean')
assay(se.aggr)[1:3, ]

With the fully configured SummarizedExperiment object, a similar SHM is plotted as in the previous examples.

spatial_hm(svg.path=svg.hum, data=se.aggr, ID=c('gene1'), ncol=1, legend.r=1.2, sub.title.size=14)

aSVG File

aSVG repository {#svg_repo}

A public aSVG repository, that can be used by spatialHeatmap directly, is the one maintained by the EBI Gene Expression Group. It contains annatomical aSVG images from different species. The same aSVG images are also used by the web service of the Expression Atlas project. In addition, the spatialHeatmap has its own repository called spatialHeatmap aSVG Repository, that stores custom aSVG files developed for this project (e.g. Figure \@ref(fig:shshm)).

If users cannot find a suitable aSVG image in these two repositories, we have developed a step-by-step SVG tutorial for creating custom aSVG images. For example, the BAR eFP browser at University of Toronto contains many anatomical images. These images can be used as templates in the SVG tutorial to construct custom aSVGs.

Moreover, in the future we will add more aSVGs to our repository, and users are welcome to deposit their own aSVGs to this repository to share them with the community.

Update aSVG features {#update}

To create and edit existing feature identifiers in aSVGs, the update_feature function can be used. The demonstration below, creates an empty folder tmp.dir1 and copies into it the homo_sapiens.brain.svg image provided by the spatialHeatmap package. Subsequently, selected feature annotations are updated in this file.

tmp.dir1 <- paste0(normalizePath(tempdir(check=TRUE), winslash="/", mustWork=FALSE), '/shm1')
if (!dir.exists(tmp.dir1)) dir.create(tmp.dir1)
svg.hum <- system.file("extdata/shinyApp/example", 'homo_sapiens.brain.svg', package="spatialHeatmap") 
file.copy(from=svg.hum, to=tmp.dir1, overwrite=TRUE) # Copy "homo_sapiens.brain.svg" file into 'tmp.dir1'

Query the above aSVG with feature and species keywords, and return the resulting matches in a data.frame.

feature.df <- return_feature(feature=c('frontal cortex', 'prefrontal cortex'), species=c('homo sapiens', 'brain'), dir=tmp.dir1, remote=FALSE, keywords.any=FALSE)
feature.df

Subsequently, create a character vector of new feature identifiers corresponding to each of the returned features. In the following examples spaces in strings will be filled with dots. This character vector must be added to the first column of the feature data.frame. The latter is used by the update_feature function to look up new features.

Sample code that creates new feature names and stores them in a character vector.

f.new <- c('prefrontal.cortex', 'frontal.cortex')

Next, new features are added to the first column of the feature data.frame.

feature.df.new <- cbind(featureNew=f.new, feature.df)
feature.df.new

Finally, the features are updated in the aSVG file(s) located in the tmp.dir1 directory.

update_feature(feature=feature.df.new, dir=tmp.dir1)

Version Informaion

sessionInfo()

Funding

This project has been funded by NSF awards: PGRP-1546879{target="_blank"}, PGRP-1810468{target="_blank"}, PGRP-1936492{target="_blank"}.

References

Any scripts or data that you put into this service are public.

spatialHeatmap documentation built on Nov. 8, 2020, 5:46 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

spatialHeatmap spatialHeatmap

_spatialHeatmap_: Visualizing Spatial Assays in Anatomical Images and Network Graphs In spatialHeatmap: spatialHeatmap

Introduction

Motivation

Design {#design}

Image Format: SVG {#term}

Data Repositories {#data_repo}

Tutorial Overview {#sample_data}

Getting Started

Installation

Spatial Heatmaps (SHMs) {#shm}

Toy Example {#toy}

aSVG Image

Numeric Data

Plot SHM

Human Brain {#hum}

Gene Expression Data

aSVG Image

Experimental Design

Preprocess Assay Data

SHM: Single Gene

SHM: Multiple Genes

SHM: HTML and Video

SHM: Customization

Mouse Organs {#mus}

Gene Expression Data

aSVG Image

Experimental Design

Preprocess Assay Data

SHM: Transparency

Chicken Organs {#chk}

Gene Expression Data

aSVG Image

Experimental Design

Preprocess Assay Data

SHM: Time Series

Arabidopsis Shoot {#shoot}

Gene Expression Data

aSVG Image

Experimental Design

Preprocess Assay Data

SHM: Microarray

SHM: Development {#mul_svg}

Matrix Heatmaps {#mhm}

Network Graphs {#net}

Module Identification

Module Visualization

Shiny App {#shiny}

Local System

Server Deployment

Custom Shiny App

Supplementary Section {#sup}

Numeric Data {#data_form}

Object Types

vector

data.frame

SummarizedExperiment

aSVG File

aSVG repository {#svg_repo}

Update aSVG features {#update}

Version Informaion

Funding

References

Try the spatialHeatmap package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

spatialHeatmap
spatialHeatmap

_spatialHeatmap_: Visualizing Spatial Assays in Anatomical Images and Network Graphs
In spatialHeatmap: spatialHeatmap

`vector`

`data.frame`

`SummarizedExperiment`