spatial_hm: Create Spatial Heatmaps

Description Usage Arguments Value Details Author(s) References Examples

View source: R/spatial_hm.R

Description

The input are a pair of annotated SVG (aSVG) file and formatted data (vector, data.frame, SummarizedExperiment). In the former, spatial features are represented by shapes and assigned unique identifiers, while the latter are numeric values measured from these spatial features and organized in specific formats. In biological cases, aSVGs are anatomical or cell structures, and data are measurements of genes, proteins, metabolites, etc. in different samples (e.g. cells, tissues). Data are mapped to the aSVG according to identifiers of assay samples and aSVG features. Only the data from samples having matching counterparts in aSVG features are mapped. The mapped features are filled with colors translated from the data, and the resulting images are termed spatial heatmaps. Note, "sample" and "feature" are two equivalent terms referring to cells, tissues, organs etc. where numeric values are measured. Matching means a target sample in data and a target spatial feature in aSVG have the same identifier.
This function is designed as much flexible as to achieve optimal visualization. For example, subplots of spatial heatmaps can be organized by gene or condition for easy comparison, in multi-layer anotomical structures selected tissues can be set transparent to expose burried features, color scale is customizable to highlight difference among features. This function also works with many other types of spatial data, such as population data plotted to geographic maps.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
spatial_hm(
  svg.path,
  data,
  sam.factor = NULL,
  con.factor = NULL,
  ID,
  lay.shm = "gene",
  ncol = 2,
  col.com = c("yellow", "orange", "red"),
  col.bar = "selected",
  bar.width = 0.08,
  legend.width = 1,
  bar.title.size = 0,
  trans.scale = NULL,
  tis.trans = NULL,
  width = 1,
  height = 1,
  legend.r = 1,
  sub.title.size = 11,
  legend.plot = "all",
  sam.legend = "identical",
  bar.value.size = 10,
  legend.plot.title = "Legend",
  legend.plot.title.size = 11,
  legend.ncol = NULL,
  legend.nrow = NULL,
  legend.position = "bottom",
  legend.direction = NULL,
  legend.key.size = 0.02,
  legend.text.size = 12,
  angle.text.key = NULL,
  position.text.key = NULL,
  legend.2nd = FALSE,
  position.2nd = "bottom",
  legend.nrow.2nd = NULL,
  legend.ncol.2nd = NULL,
  legend.key.size.2nd = 0.03,
  legend.text.size.2nd = 10,
  angle.text.key.2nd = 0,
  position.text.key.2nd = "right",
  add.feature.2nd = FALSE,
  label = FALSE,
  label.size = 4,
  label.angle = 0,
  hjust = 0,
  vjust = 0,
  opacity = 1,
  key = TRUE,
  line.size = 0.2,
  line.color = "grey70",
  preserve.scale = TRUE,
  verbose = TRUE,
  out.dir = NULL,
  anm.width = 650,
  anm.height = 550,
  selfcontained = FALSE,
  video.dim = "640x480",
  res = 500,
  interval = 1,
  framerate = 1,
  legend.value.vdo = NULL,
  ...
)

Arguments

svg.path

The path of aSVG file(s). E.g.: system.file("extdata/shinyApp/example", "gallus_gallus.svg", package="spatialHeatmap"). Multiple aSVGs are also accepted, such as aSVGs depicting organs development across mutiple times. In this case, the aSVGs should be indexed with suffixes "_shm1", "_shm2", ..., such as "arabidopsis_thaliana.organ_shm1.svg", "arabidopsis_thaliana.organ_shm2.svg", and the paths of these aSVGs be provided in a character vector.
See return_feature for details on how to directly download aSVGs from the EBI aSVG repository https://github.com/ebi-gene-expression-group/anatomogram/tree/master/src/svg and spatialHeatmap aSVG Repository https://github.com/jianhaizhang/spatialHeatmap_aSVG_Repository developed in this project.

data

An object of data.frame or SummarizedExperiment. In either case, the columns and rows should be sample/conditions and assayed items (e.g. genes, proteins, metabolites) respectively. If data.frame, the column names should follow the naming scheme "sample__condition". The "sample" is a general term and stands for cells, tissues, organs, etc., where the values are measured. The "condition" is also a general term and refers to experiment treatments applied to "sample" such as drug dosage, temperature, time points, etc. If certain samples are not expected to be colored in "spatial heatmaps" (see spatial_hm), they are not required to follow this naming scheme. In the downstream interactive network (see network), if users want to see node annotation by mousing over a node, a column of row item annotation could be optionally appended to the last column.
In the case of SummarizedExperiment, the assays slot stores the data matrix. Similarly, the rowData slot could optionally store a data frame of row item anntation, which is only relevant to the interactive network. The colData slot usually contains a data frame with one column of sample replicates and one column of condition replicates. It is crucial that replicate names of the same sample or condition must be identical. E.g. If sampleA has 3 replicates, "sampleA", "sampleA", "sampleA" is expected while "sampleA1", "sampleA2", "sampleA3" is regarded as 3 different samples. If original column names in the assay slot already follow the "sample__condition" scheme, then the colData slot is not required at all.
In the function spatial_hm, this argument can also be a numeric vector. In this vector, every value should be named, and values expected to color the "spatial heatmaps" should follow the naming scheme "sample__condition".
In certain cases, there is no condition associated with data. Then in the naming scheme of data frame or vector, the "__condition" part could be discarded. In SummarizedExperiment, the "condition" column could be discarded in colData slot.
Note, regardless of data class the double underscore is a special string that is reserved for specific purposes in "spatialHeatmap", and thus should be avoided for naming feature/samples and conditions.

sam.factor

The column name corresponding to samples in the colData of SummarizedExperiment. If the original column names in the assay slot already follows the scheme "sample__condition", then the colData slot is not required and accordingly this argument could be NULL.

con.factor

The column name corresponding to conditions in the colData of SummarizedExperiment. Could be NULL if column names of in the assay slot already follows the scheme "sample__condition", or no condition is associated with the data.

ID

A character vector of assyed items (e.g. genes, proteins) whose abudance values are used to color the aSVG.

lay.shm

One of "gene", "con", or "none". If "gene", spatial heatmaps are organized by genes proteins, or metabolites, etc. and conditions are sorted whithin each gene. If "con", spatial heatmaps are organized by the conditions/treatments applied to experiments, and genes are sorted winthin each condition. If "none", spaital heatmaps are organized by the gene order in ID and conditions follow the order they appear in data.

ncol

An integer of the number of columns to display the spatial heatmaps, which does not include the legend plot.

col.com

A character vector of the color components used to build the color scale. The default is c('yellow', 'orange', 'red').

col.bar

One of "selected" or "all", the former uses values of ID to build the color scale while the latter uses all values from the data. The default is "selected".

bar.width

The width of color bar that ranges from 0 to 1. The default is 0.08.

legend.width

The width of legend plot that ranges from 0 to 1 (default).

bar.title.size

A numeric of color bar title size. The default is 0.

trans.scale

One of "log2", "exp2", "row", "column", or NULL, which means transform the data by "log2" or "2-base expoent", scale by "row" or "column", or no manipuation respectively. This argument should be used if colors across samples cannot be distinguished due to low variance or outliers.

tis.trans

A character vector of tissue/spatial feature identifiers that will be set transparent. E.g c("brain", "heart"). This argument is used when target features are covered by overlapping features and the latter should be transparent.

width

A numeric of overall width of all subplots, between 0 and 1. The default is 1.

height

A numeric of overall height of all subplots, between 0 and 1. The default is 1.

legend.r

A numeric to adjust the dimension of the legend plot. The default is 1. The larger, the higher ratio of width to height.

sub.title.size

A numeric of the subtitle font size of each individual spatial heatmap. The default is 11.

legend.plot

A vector of suffix(es) of aSVG file name(s) such as c('shm1', 'shm2'). Only aSVG(s) whose suffix(es) are assigned to this arugment will have a legend plot on the right. The default is 'all' and each aSVG will have a legend plot. If NULL, no legend plot is shown. Only applicable if multiple aSVG files are provided to svg.path.

sam.legend

One of "identical", "all", or a character vector of tissue/spatial feature identifiers from the aSVG file. The default is "identical" and all the identical/matching tissues/spatial features between the data and aSVG file are indicated in the legend plot. If "all", all tissues/spatial features in the aSVG are shown. If a vector, only the tissues/spatial features in the vector are shown.

bar.value.size

A numeric of value size in the color bar y-axis. The default is 10.

legend.plot.title

The title of the legend plot. The default is 'Legend'.

legend.plot.title.size

The title size of the legend plot. The default is 11.

legend.ncol

An integer of the total columns of keys in the legend plot. The default is NULL. If both legend.ncol and legend.nrow are used, the product of the two arguments should be equal or larger than the total number of shown spatial features.

legend.nrow

An integer of the total rows of keys in the legend plot. The default is NULL. It is only applicable to the legend plot. If both legend.ncol and legend.nrow are used, the product of the two arguments should be equal or larger than the total number of matching spatial features.

legend.position

the position of legends ("none", "left", "right", "bottom", "top", or two-element numeric vector)

legend.direction

layout of items in legends ("horizontal" or "vertical")

legend.key.size

A numeric of the legend key size ("npc"), applicable to the legend plot. The default is 0.02.

legend.text.size

A numeric of the legend label size, applicable to the legend plot. The default is 12.

angle.text.key

A value of key text angle in legend plot. The default is NULL, equivalent to 0.

position.text.key

The position of key text in legend plot, one of "top", "right", "bottom", "left". Default is NULL, equivalent to "right".

legend.2nd

Logical, TRUE or FALSE. If TRUE, the secondary legend is added to each spatial heatmap, which are the numeric values of each matching spatial features. The default its FALSE. Only applies to the static image.

position.2nd

The position of the secondary legend. One of "top", "right", "bottom", "left", or a two-component numeric vector. The default is "bottom". Applies to the static image and video.

legend.nrow.2nd

An integer of rows of the secondary legend keys. Applies to the static image and video.

legend.ncol.2nd

An integer of columns of the secondary legend keys. Applies to the static image and video.

legend.key.size.2nd

A numeric of legend key size. The default is 0.03. Applies to the static image and video.

legend.text.size.2nd

A numeric of the secondary legend text size. The default is 10. Applies to the static image and video.

angle.text.key.2nd

A value of angle of key text in the secondary legend. Default is 0. Applies to the static image and video.

position.text.key.2nd

The position of key text in the secondary legend, one of "top", "right", "bottom", "left". Default is "right". Applies to the static image and video.

add.feature.2nd

Logical TRUE or FALSE. Add feature identifiers to the secondary legend or not. The default is FALSE. Applies to the static image.

label

Logical. If TRUE, spatial features having matching samples are labeled by feature identifiers. The default is FALSE. It is useful when spatial features are labeled by similar colors.

label.size

The size of spatial feature labels in legend plot. The default is 4.

label.angle

The angle of spatial feature labels in legend plot. Default is 0.

hjust

The value to horizontally adjust positions of spatial feature labels in legend plot. Default is 0.

vjust

The value to vertically adjust positions of spatial feature labels in legend plot. Default is 0.

opacity

The transparency of colored spatial features in legend plot. Default is 1. If 0, features are totally transparent.

key

Logical. The default is TRUE and keys are added in legend plot. If label is TRUE, the keys could be removed.

line.size

A numeric of the shape outline size. Default is 0.2.

line.color

A character of the shape outline color. Default is "grey70".

preserve.scale

Logical, TRUE or FALSE. If TRUE (default), the relative dimensions of multiple aSVGs are preserved. Only applicable if multiple aSVG files are provided to svg.path. The original dimension (width/height) is specified in the top-most node "svg" in the aSVG file.

verbose

Logical, FALSE or TRUE. If TRUE the samples in data not colored in spatial heatmaps are printed to R console. Default is TRUE.

out.dir

The directory to save interactive spatial heatmaps as independent HTML files and videos. Default is NULL, and the HTML files and videos are not saved.

anm.width

The width of spatial heatmaps in HTML files. Default is 650.

anm.height

The height of spatial heatmaps in HTML files. Default is 550.

selfcontained

Whether to save the HTML as a single self-contained file (with external resources base64 encoded) or a file with external resources placed in an adjacent directory.

video.dim

A single character of the dimension of video frame in form of 'widthxheight', such as '1920x1080', '1280x800', '320x568', '1280x1024', '1280x720', '320x480', '480x360', '600x600', '800x600', '640x480' (default). The aspect ratio of spatial heatmaps are decided by width and height.

res

Resolution of the video in dpi.

interval

The time interval (seconds) between spatial heatmap frames in the video. Default is 1.

framerate

An integer of video framerate in frames per seconds. Default is 1. Larger values make the video smoother.

legend.value.vdo

Logical TRUE or FALSE. If TRUE, the numeric values of matching spatial features are added to video legend. The default is NULL.

...

additional element specifications not part of base ggplot2. In general, these should also be defined in the element tree argument.

Value

An image of spatial heatmap(s), a two-component list of the spatial heatmap(s) in ggplot format and a data frame of mapping between assayed samples and aSVG features.

Details

See the package vignette (browseVignettes('spatialHeatmap')).

Author(s)

Jianhai Zhang jzhan067@ucr.edu; zhang.jianhai@hotmail.com
Dr. Thomas Girke thomas.girke@ucr.edu

References

https://www.gimp.org/tutorials/
https://inkscape.org/en/doc/tutorials/advanced/tutorial-advanced.en.html
http://www.microugly.com/inkscape-quickguide/ Martin Morgan, Valerie Obenchain, Jim Hester and Hervé Pagès (2018). SummarizedExperiment: SummarizedExperiment container. R package version 1.10.1
H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.
Jeroen Ooms (2018). rsvg: Render SVG Images into PDF, PNG, PostScript, or Bitmap Arrays. R package version 1.3. https://CRAN.R-project.org/package=rsvg
R. Gentleman, V. Carey, W. Huber and F. Hahne (2017). genefilter: genefilter: methods for filtering genes from high-throughput experiments. R package version 1.58.1
Paul Murrell (2009). Importing Vector Graphics: The grImport Package for R. Journal of Statistical Software, 30(4), 1-37. URL http://www.jstatsoft.org/v30/i04/
Baptiste Auguie (2017). gridExtra: Miscellaneous Functions for "Grid" Graphics. R package version 2.3. https://CRAN.R-project.org/package=gridExtra
R Core Team (2018). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. RL https://www.R-project.org/
https://github.com/ebi-gene-expression-group/anatomogram/tree/master/src/svg
Yu, G., 2020. ggplotify: Convert Plot to ’grob’ or ’ggplot’ Object. R package version 0.0.5.URLhttps://CRAN.R-project.org/package=ggplotify30
Keays, Maria. 2019. ExpressionAtlas: Download Datasets from EMBL-EBI Expression Atlas
Love, Michael I., Wolfgang Huber, and Simon Anders. 2014. "Moderated Estimation of Fold Change and Dispersion for RNA-Seq Data with DESeq2." Genome Biology 15 (12): 550. doi:10.1186/s13059-014-0550-8
Guangchuang Yu (2020). ggplotify: Convert Plot to 'grob' or 'ggplot' Object. R package version 0.0.5. https://CRAN.R-project.org/package=ggplotify
Cardoso-Moreira, Margarida, Jean Halbert, Delphine Valloton, Britta Velten, Chunyan Chen, Yi Shao, Angélica Liechti, et al. 2019. “Gene Expression Across Mammalian Organ Development.” Nature 571 (7766): 505–9

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
## In the following examples, the 2 toy data come from an RNA-seq analysis on development of 7
## chicken organs under 9 time points (Cardoso-Moreira et al. 2019). For conveninece, they are
## included in this package. The complete raw count data are downloaded using the R package
## ExpressionAtlas (Keays 2019) with the accession number "E-MTAB-6769". Toy data1 is used as
## a "data frame" input to exemplify data of simple samples/conditions, while toy data2 as
## "SummarizedExperiment" to illustrate data involving complex samples/conditions.   

## Set up toy data.

# Access toy data1.
cnt.chk.simple <- system.file('extdata/shinyApp/example/count_chicken_simple.txt',
package='spatialHeatmap')
df.chk <- read.table(cnt.chk.simple, header=TRUE, row.names=1, sep='\t', check.names=FALSE)
# Columns follow the namig scheme "sample__condition", where "sample" and "condition" stands
# for organs and time points respectively.
df.chk[1:3, ]

# A column of gene annotation can be appended to the data frame, but is not required.  
ann <- paste0('ann', seq_len(nrow(df.chk))); ann[1:3]
df.chk <- cbind(df.chk, ann=ann)
df.chk[1:3, ]

# Access toy data2. 
cnt.chk <- system.file('extdata/shinyApp/example/count_chicken.txt', package='spatialHeatmap')
count.chk <- read.table(cnt.chk, header=TRUE, row.names=1, sep='\t')
count.chk[1:3, 1:5]

# A targets file describing samples and conditions is required for toy data2. It should be made
# based on the experiment design, which is accessible through the accession number 
# "E-MTAB-6769" in the R package ExpressionAtlas. An example targets file is included in this
# package and accessed below. 
# Access the example targets file. 
tar.chk <- system.file('extdata/shinyApp/example/target_chicken.txt', package='spatialHeatmap')
target.chk <- read.table(tar.chk, header=TRUE, row.names=1, sep='\t')
# Every column in toy data2 corresponds with a row in targets file. 
target.chk[1:5, ]
# Store toy data2 in "SummarizedExperiment".
library(SummarizedExperiment)
se.chk <- SummarizedExperiment(assay=count.chk, colData=target.chk)
# The "rowData" slot can store a data frame of gene annotation, but not required.
rowData(se.chk) <- DataFrame(ann=ann)

## As conventions, raw sequencing count data should be normalized, aggregated, and filtered to
## reduce noise.

# Normalize count data.
# The normalizing function "calcNormFactors" (McCarthy et al. 2012) with default settings
# is used.  
df.nor.chk <- norm_data(data=df.chk, norm.fun='CNF', data.trans='log2')
se.nor.chk <- norm_data(data=se.chk, norm.fun='CNF', data.trans='log2')
# Aggregate count data.
# Aggregate "sample__condition" replicates in toy data1.
df.aggr.chk <- aggr_rep(data=df.nor.chk, aggr='mean')
df.aggr.chk[1:3, ]
# Aggregate "sample_condition" replicates in toy data2, where "sample" is "organism_part" and
# "condition" is "age". 
se.aggr.chk <- aggr_rep(data=se.nor.chk, sam.factor='organism_part', con.factor='age',
aggr='mean')
assay(se.aggr.chk)[1:3, 1:3]
# Filter out genes with low counts and low variance. Genes with counts over 5 (log2 unit) in
# at least 1% samples (pOA), and coefficient of variance (CV) between 0.2 and 100 are retained.
# Filter toy data1.
df.fil.chk <- filter_data(data=df.aggr.chk, pOA=c(0.01, 5), CV=c(0.2, 100), dir=NULL)
# Filter toy data2.
se.fil.chk <- filter_data(data=se.aggr.chk, sam.factor='organism_part', con.factor='age',
pOA=c(0.01, 5), CV=c(0.2, 100), dir=NULL)

## Spatial heatmaps.

# The target chicken aSVG is downloaded from the EBI aSVG repository
# (https://github.com/ebi-gene-expression-group/anatomogram/tree/master/src/svg) directly with
# function "return_feature". It is included in this package and accessed as below. Details on
# how this aSVG is selected are documented in function "return_feature".
svg.chk <- system.file("extdata/shinyApp/example", "gallus_gallus.svg",
package="spatialHeatmap")
# Plot spatial heatmaps on gene "ENSGALG00000019846".
# Toy data1. 
spatial_hm(svg.path=svg.chk, data=df.fil.chk, ID='ENSGALG00000019846', height=0.4,
legend.r=1.9, sub.title.size=7, ncol=3)
# Save spaital heatmaps as HTML and video files by assigning "out.dir" "~/test". 

if (!dir.exists('~/test')) dir.create('~/test')
spatial_hm(svg.path=svg.chk, data=df.fil.chk, ID='ENSGALG00000019846', height=0.4,
legend.r=1.9, sub.title.size=7, ncol=3, out.dir='~/test')

# Toy data2.
spatial_hm(svg.path=svg.chk, data=se.fil.chk, ID='ENSGALG00000019846', legend.r=1.9,
legend.nrow=2, sub.title.size=7, ncol=3)

# The data can also come as as a simple named vector. The following gives an example on a
# vector of 3 random values. 
# Random values.
vec <- sample(1:100, 3)
# Name the vector. The last name is assumed as a random sample without a matching feature
# in aSVG.
names(vec) <- c('brain', 'heart', 'notMapped')
vec
# Plot.
spatial_hm(svg.path=svg.chk, data=vec, ID='geneX', height=0.6, legend.r=1.5, ncol=1)

# Plot spatial heatmaps on aSVGs of two Arabidopsis thaliana development stages.

# Make up a random numeric data frame.
df.test <- data.frame(matrix(sample(x=1:100, size=50, replace=TRUE), nrow=10))
colnames(df.test) <- c('shoot_totalA__condition1', 'shoot_totalA__condition2', 
'shoot_totalB__condition1', 'shoot_totalB__condition2', 'notMapped')
rownames(df.test) <- paste0('gene', 1:10) # Assign row names 
df.test[1:3, ]
# aSVG of development stage 1.
svg1 <- system.file("extdata/shinyApp/example", "arabidopsis_thaliana.organ_shm1.svg",
package="spatialHeatmap")
# aSVG of development stage 2.
svg2 <- system.file("extdata/shinyApp/example", "arabidopsis_thaliana.organ_shm2.svg",
package="spatialHeatmap")
# Spatial heatmaps. 
spatial_hm(svg.path=c(svg1, svg2), data=df.test, ID=c('gene1'), height=0.8, legend.r=1.6,
preserve.scale=TRUE) 

spatialHeatmap documentation built on Nov. 8, 2020, 5:46 p.m.