SpatialEnrichment: Identifying spatially enriched or depleted biomolecules

SpatialEnrichmentR Documentation

Identifying spatially enriched or depleted biomolecules

Description

The spatial enrichment (SpEn) is designed to detect spatially enriched or depleted biomolecules (genes, proteins, etc) for chosen spatial features (cellular compartments, tissues, organs, etc). It compares each feature with all other reference features. The biomolecules significantly up- or down-regulated in one feature relative to reference features are denoted spatially enriched or depleted respectively. The underlying differential expression analysis methods include edgeR (Robinson et al, 2010), limma (Ritchie et al, 2015), and DESeq2 (Love et al, 2014). By querying a feature of interest from the enrichment results, the enriched or depleted biomolecules will be returned.
In addition, the SpEn is also able to identify biomolecules enriched or depleted in experiment vairables in a similar manner.

'sf_var()' subsets data according to given spatial features and variables.

'spatial_enrich()' detects enriched or depleted biomolecules for each given spatial feature.

'query_enrich()' queries enriched or depleted biomolecules in the enrichment results returned by spatial_enrich for a chosen spatial feature.

'ovl_enrich()' plots overlap of enrichment results across spatial features in form an upset plot, overlap matrix, or Venn diagram.

'graph_line()' plots expression values of chosen biomolecules in a line graph.

Usage

sf_var(
  data,
  feature,
  ft.sel = NULL,
  variable = NULL,
  var.sel = NULL,
  com.by = "ft"
)

spatial_enrich(
  data,
  method = c("edgeR"),
  norm = "TMM",
  m.array = FALSE,
  pairwise = FALSE,
  log2.fc = 1,
  p.adjust = "BH",
  fdr = 0.05,
  outliers = 0,
  aggr = "mean",
  log2.trans = TRUE,
  verbose = TRUE
)

query_enrich(res, query, other = FALSE, data.rep = FALSE)

ovl_enrich(
  res,
  type = "up",
  plot = "matrix",
  order.by = "freq",
  nintersects = 40,
  point.size = 3,
  line.size = 1,
  mb.ratio = c(0.6, 0.4),
  text.scale = 1.5,
  upset.arg = list(),
  show.plot = TRUE,
  venn.arg = list(),
  axis.agl = 45,
  font.size = 5,
  cols = c("lightcyan3", "darkorange")
)

graph_line(
  data,
  scale = "none",
  x.title = "Samples",
  y.title = "Assay values",
  linewidth = 1,
  text.size = 15,
  text.angle = 60,
  lgd.pos = "right",
  lgd.guide = guides(color = guide_legend(nrow = 1, byrow = TRUE, title = NULL))
)

Arguments

data
sf_var

A SummarizedExperiment object. The colData slot is required to contain at least two columns of spatial features and experiment variables respectively.

spatial_enrich

A SummarizedExperiment object returned by sf_var.

graph_line

A data.frame, where rows are biomolecules and columns are spatial features.

feature

The column name in the colData slot of SummarizedExperiment that contains spatial features.

ft.sel

A vector of spatial features to choose.

variable

The column name in the colData slot of SummarizedExperiment that contains experiment variables.

var.sel

A vector of variables to choose.

com.by

One of ft, var, or ft.var. If ft, the enrichment is performed for each spatial feature and the variables are treated as replicates. If var the enrichment is performed for each variable and spatial features are treated as replicates. If ft.var, spatial features (tissue1, tissue2) and variables (var1, var2) are combined such as tissue1__var1, tissue1_var2, tissue2__var1, tissue2_var2. The enrichment is performed for each combination.

method

One of edgeR, limma, and DESeq2.

norm

The normalization method (TMM, RLE, upperquartile, none) in edgeR. The default is TMM. Details: https://www.rdocumentation.org/packages/edgeR/versions/3.14.0/topics/calcNormFactors.

m.array

Logical. 'TRUE' and 'FALSE' indicate the input are microarray and count data respectively.

pairwise

Logical. If 'TRUE', pairwise comparisons will be performed starting dispersion estimation. If 'FALSE' (default), all samples are fitted into a GLM model together, then pairwise comparisons are performed through contrasts.

log2.fc

The log2-fold change cutoff. The default is 1.

p.adjust

The method (holm, hochberg, hommel, bonferroni, BH, BY, fdr, or none) for adjusting p values in multiple hypothesis testing. The default is BH.

fdr

The FDR cutoff. The default is 0.05.

outliers

The number of outliers allowed in the references. If there are too many references, there might be no enriched/depleted biomolecules in the query feature. To avoid this, set a certain number of outliers.

aggr

One of mean (default) or median. The method to aggregated replicates in the assay data.

log2.trans

Logical. If TRUE (default), the aggregated data (see aggr) is transformed to log2-scale and will be further used for plotting SHMs.

verbose

Logical. If 'TRUE' (default), intermmediate messages will be printed.

res

Enrichment results returned by spatial_enrich.

query

A spatial feature for query.

other

Logical (default is 'FALSE'). If 'TRUE' other genes that are neither enriched or depleted will also be returned.

data.rep

Logical. If 'TRUE' normalized data before aggregating replicates will be returned. If 'FALSE', normalized data after aggretating replicates will be returned.

type

One of up (default) or down, which refers to up- or down-regulated biomolecules.

plot

One of upset, matrix, or venn, corresponding to upset plot, overlap matrix, or Venn diagram respectively.

order.by

How the intersections in the matrix should be ordered by. Options include frequency (entered as "freq"), degree, or both in any order.

nintersects

Number of intersections to plot. If set to NA, all intersections will be plotted.

point.size

Size of points in matrix plot

line.size

The line thickness in overlap matrix.

mb.ratio

Ratio between matrix plot and main bar plot (Keep in terms of hundredths)

text.scale

Numeric, value to scale the text sizes, applies to all axis labels, tick labels, and numbers above bar plot. Can be a universal scale, or a vector containing individual scales in the following format: c(intersection size title, intersection size tick labels, set size title, set size tick labels, set names, numbers above bars)

upset.arg

A list of additional arguments passed to upset.

show.plot

Logical flag indicating whether the plot should be displayed. If false, simply returns the group count matrix.

venn.arg

A list of additional arguments passed to venn.

axis.agl

The angle of axis text in overlap matrix.

font.size

The font size of all text in overlap matrix.

cols

A vector of two colors indicating low and high values in the overlap matrix respectively. The default is c("lightcyan3", "darkorange").

scale

The method to scale the data. If none (default), no scaling. If row, each row is scaled independently. If all, all rows are scaled as a whole.

x.title, y.title

The title of X-axis and Y-axis respectively.

linewidth

The line width.

text.size

The font size of all text.

text.angle

The angle of axis text.

lgd.pos

The position of legend. The default is right.

lgd.guide

The guides function in ggplot2 for customizing legends.

Value

'sf_var'

A SummarizedExperiment object.

'spatial_enrich'

A list object.

'query_enrich'

A SummarizedExperiment object.

'ovl_enrich'

An UpSet plot, overlap matrix plot, or Venn diagram.

'graph_line'

A ggplot.

Author(s)

Jianhai Zhang jzhan067@ucr.edu
Dr. Thomas Girke thomas.girke@ucr.edu

References

Cardoso-Moreira, Margarida, Jean Halbert, Delphine Valloton, Britta Velten, Chunyan Chen, Yi Shao, Angélica Liechti, et al. 2019. “Gene Expression Across Mammalian Organ Development.” Nature 571 (7766): 505–9 Keays, Maria. 2019. ExpressionAtlas: Download Datasets from EMBL-EBI Expression Atlas Martin Morgan, Valerie Obenchain, Jim Hester and Hervé Pagès (2018). SummarizedExperiment: SummarizedExperiment container. R package version 1.10.1 Robinson MD, McCarthy DJ and Smyth GK (2010). edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139-140 Ritchie, M.E., Phipson, B., Wu, D., Hu, Y., Law, C.W., Shi, W., and Smyth, G.K. (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research 43(7), e47. Love, M.I., Huber, W., Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Genome Biology 15(12):550 (2014) Nils Gehlenborg (2019). UpSetR: A More Scalable Alternative to Venn and Euler Diagrams for Visualizing Intersecting Sets. R package version 1.4.0. https://CRAN.R-project.org/package=UpSetR H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016. Hadley Wickham (2007). Reshaping Data with the reshape Package. Journal of Statistical Software, 21(12), 1-20. URL http://www.jstatsoft.org/v21/i12/.

Examples


## In the following examples, the toy data come from an RNA-seq analysis on development of 7
## chicken organs under 9 time points (Cardoso-Moreira et al. 2019). For conveninece, it is
## included in this package. The complete raw count data are downloaded using the R package
## ExpressionAtlas (Keays 2019) with the accession number "E-MTAB-6769".   

library(SummarizedExperiment) 
# Access the count table. 
cnt.chk <- read.table(system.file('extdata/shinyApp/data/count_chicken.txt', package='spatialHeatmap'), header=TRUE, row.names=1,sep='\t')
cnt.chk[1:3, 1:5]
# A targets file describing spatial features and conditions is required for toy data. It should be made
# based on the experiment design, which is accessible through the accession number 
# "E-MTAB-6769" in the R package ExpressionAtlas. An example targets file is included in this
# package and accessed below. 

# Access the example targets file. 
tar.chk <- read.table(system.file('extdata/shinyApp/data/target_chicken.txt', package='spatialHeatmap'), header=TRUE, row.names=1, sep='\t') 
# Every column in count table corresponds with a row in targets file. 
tar.chk[1:5, ]
# Store count data and targets file in "SummarizedExperiment".
se.chk <- SummarizedExperiment(assay=cnt.chk, colData=tar.chk)
# The "rowData" slot can store a data frame of gene metadata, but not required. Only the 
# column named "metadata" will be recognized. 
# Pseudo row metadata.
metadata <- paste0('meta', seq_len(nrow(cnt.chk))); metadata[1:3]
rowData(se.chk) <- DataFrame(metadata=metadata)

# Subset the count data by features (brain, heart, kidney) and variables (day10, day12).
# By setting com.by='ft', the subsequent spatial enrichment will be performed across 
# features with the variables as replicates. 
data.sub <- sf_var(data=se.chk, feature='organism_part', ft.sel=c('brain', 'kidney',
 'heart', 'liver'), variable='age', var.sel=c('day10', 'day35'), com.by='ft')

## As conventions, raw sequencing count data should be normalized and filtered to
## reduce noise. Since normalization will be performed in spatial enrichment, only filtering
## is required.  

# Filter out genes with low counts and low variance. Genes with counts over 5 in
# at least 10% samples (pOA), and coefficient of variance (CV) between 3.5 and 100 are 
# retained.
data.sub.fil <- filter_data(data=data.sub, sam.factor='organism_part', con.factor='age',
pOA=c(0.1, 5), CV=c(0.7, 100))

# Spatial enrichment for every spatial feature with 1 outlier allowed.  
enr.res <- spatial_enrich(data.sub.fil, method=c('edgeR'), norm='TMM', log2.fc=1, fdr=0.05, outliers=1)
# Overlaps of enriched genes across features.
ovl_enrich(enr.res, type='up', plot='upset')
# Query the results for brain.
en.brain <- query_enrich(enr.res, 'brain')
rowData(en.brain)[1:3, c('type', 'total', 'method')] 

# Read aSVG image into an "SVG" object.
svg.chk <- system.file("extdata/shinyApp/data", "gallus_gallus.svg", 
package="spatialHeatmap")
svg.chk <- read_svg(svg.chk)
# Plot an enrichment SHM.
dat.enrich <- SPHM(svg=svg.chk, bulk=en.brain)
shm(data=dat.enrich, ID=rownames(en.brain)[1], legend.r=1, legend.nrow=7, sub.title.size=10, ncol=2, bar.width=0.09, lay.shm='gene')
# Line graph of gene expression profile.
graph_line(assay(en.brain[1, , drop=FALSE]), lgd.pos='bottom')

jianhaizhang/spatialHeatmap documentation built on July 31, 2024, 2:59 a.m.