matrix_hm: Hierarchical clustering combined with matrix heatmap

View source: R/matrix_hm.R

matrix_hmR Documentation

Hierarchical clustering combined with matrix heatmap

Description

Given a data matrix returned by submatrix, hierarchical clustering is performed on rows and columns respectively and the results are presented in a matrix heatmap, which supports static and interactive modes. In the matrix heatmap, rows and columns are sorted by hierarchical clustering dendrograms and rows of target biomolecules are tagged by black lines. In the interactive heatmap, users can zoom in and out by drawing a rectangle and by double clicking, respectively.

Usage

matrix_hm(
  ID,
  data,
  assay.na = NULL,
  scale = "row",
  col = c("yellow", "red"),
  cut.h,
  col.n = 200,
  keysize = 1.8,
  main = NULL,
  title.size = 10,
  cexCol = 1,
  cexRow = 1,
  angleCol = 45,
  angleRow = 45,
  sep.color = "black",
  sep.width = 0.02,
  static = TRUE,
  margin = c(10, 10),
  arg.lis1 = list(),
  arg.lis2 = list()
)

Arguments

ID

A vector of biomolecules of interest in the data matrix.

data

The subsetted data matrix returned by the function submatrix.

assay.na

Applicable when data is 'SummarizedExperiment' or 'SingleCellExperiment', where multiple assays could be stored. The name of target assay to use.

scale

One of 'row', 'column', or 'no' (default), corresponding to scale the heatmap by row, column, or no scaling respectively.

col

A character vector of color ingredients for the color scale. The default is c('yellow', 'orange', 'red').

cut.h

A numeric of the cutting height in the row dendrograms.

col.n

The number of colors in palette.

keysize

A numeric value indicating the size of the color key.

main

The title of the matrix heatmap.

title.size

A numeric value of the title size.

cexCol

A numeric value of column name size. Default is 1.

cexRow

A numeric value of row name size. Default is 1.

angleCol

The angle of column names. The default is 45.

angleRow

The angle of row names. The default is 45.

sep.color

The color of the two lines labeling the row of ID. The default is "black".

sep.width

The width of two lines labeling the row of ID. The default is 0.02.

static

Logical, 'TRUE' and 'FALSE' returns the static and interactive matrix heatmap respectively.

margin

A vector of two numbers, specifying bottom and right margins respectively. The default is c(10, 10).

arg.lis1

A list of additional arguments passed to the heatmap.2 function from "gplots" package. E.g. ‘list(xlab=’sample', ylab='gene')'.

arg.lis2

A list of additional arguments passed to the ggplot function from "ggplot2" package.

Value

A static or interactive matrix heatmap.

Author(s)

Jianhai Zhang jzhan067@ucr.edu
Dr. Thomas Girke thomas.girke@ucr.edu

References

Martin Morgan, Valerie Obenchain, Jim Hester and Hervé Pagès (2018). SummarizedExperiment: SummarizedExperiment container. R package version 1.10.1 Andrie de Vries and Brian D. Ripley (2016). ggdendro: Create Dendrograms and Tree Diagrams Using 'ggplot2'. R package version 0.1-20. https://CRAN.R-project.org/package=ggdendro H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016. Carson Sievert (2018) plotly for R. https://plotly-book.cpsievert.me Langfelder P and Horvath S, WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 2008, 9:559 doi:10.1186/1471-2105-9-559 R Core Team (2018). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/ Gregory R. Warnes, Ben Bolker, Lodewijk Bonebakker, Robert Gentleman, Wolfgang Huber Andy Liaw, Thomas Lumley, Martin Maechler, Arni Magnusson, Steffen Moeller, Marc Schwartz and Bill Venables (2019). gplots: Various R Programming Tools for Plotting Data. R package version 3.0.1.1. https://CRAN.R-project.org/package=gplots Hadley Wickham (2007). Reshaping Data with the reshape Package. Journal of Statistical Software, 21(12), 1-20. URL http://www.jstatsoft.org/v21/i12/ Keays, Maria. 2019. ExpressionAtlas: Download Datasets from EMBL-EBI Expression Atlas Love, Michael I., Wolfgang Huber, and Simon Anders. 2014. "Moderated Estimation of Fold Change and Dispersion for RNA-Seq Data with DESeq2." Genome Biology 15 (12): 550. doi:10.1186/s13059-014-0550-8 Cardoso-Moreira, Margarida, Jean Halbert, Delphine Valloton, Britta Velten, Chunyan Chen, Yi Shao, Angélica Liechti, et al. 2019. “Gene Expression Across Mammalian Organ Development.” Nature 571 (7766): 505–9 Matt Dowle and Arun Srinivasan (2019). data.table: Extension of 'data.frame'. R package version 1.12.8. https://CRAN.R-project. org/package=data.table

Examples


## The example data included in this package come from an RNA-seq analysis on 
## development of 7 chicken organs under 9 time points (Cardoso-Moreira et al. 2019). 
## The complete raw count data are downloaded using the R package ExpressionAtlas
## (Keays 2019) with the accession number "E-MTAB-6769". 

# Access example count data. 
count.chk <- read.table(system.file('extdata/shinyApp/data/count_chicken.txt', 
package='spatialHeatmap'), header=TRUE, row.names=1, sep='\t')
count.chk[1:3, 1:5]

# A targets file describing spatial features and variables is made based on the 
# experiment design.
target.chk <- read.table(system.file('extdata/shinyApp/data/target_chicken.txt', 
package='spatialHeatmap'), header=TRUE, row.names=1, sep='\t')
# Every column in example data 2 corresponds with a row in the targets file. 
target.chk[1:5, ]
# Store example data in "SummarizedExperiment".
library(SummarizedExperiment)
se.chk <- SummarizedExperiment(assay=count.chk, colData=target.chk)

# Normalize data.
se.chk.nor <- norm_data(data=se.chk, norm.fun='CNF', log2.trans=TRUE)

# Aggregate replicates of "spatialFeature_variable", where spatial features are organs
# and variables are ages.
se.chk.aggr <- aggr_rep(data=se.chk.nor, sam.factor='organism_part', con.factor='age',
aggr='mean')
assay(se.chk.aggr)[1:3, 1:3]

# Genes with experssion values >= 5 in at least 1% of all samples (pOA), and coefficient
# of variance (CV) between 0.2 and 100 are retained.
se.chk.fil <- filter_data(data=se.chk.aggr, sam.factor='organism_part', con.factor='age', 
pOA=c(0.01, 5), CV=c(0.2, 100), file=NULL)

## Subset the data matrix for gene 'ENSGALG00000019846' and 'ENSGALG00000000112'.
se.sub.mat <- submatrix(data=se.chk.fil, ID=c('ENSGALG00000019846', 
'ENSGALG00000000112'), p=0.1) 

## Hierarchical clustering. 
library(dendextend)
# Static matrix heatmap.
mhm.res <- matrix_hm(ID=c('ENSGALG00000019846', 'ENSGALG00000000112'), data=se.sub.mat, 
angleCol=80, angleRow=35, cexRow=0.8, cexCol=0.8, margin=c(8, 10), static=TRUE, 
arg.lis1=list(offsetRow=0.01, offsetCol=0.01))
# Clusters containing "ENSGALG00000019846".
cut_dendro(mhm.res$rowDendrogram, h=15, 'ENSGALG00000019846')

# Interactive matrix heatmap.
 matrix_hm(ID=c('ENSGALG00000019846', 'ENSGALG00000000112'), data=se.sub.mat, 
angleCol=80, angleRow=35, cexRow=0.8, cexCol=0.8, margin=c(8, 10), static=FALSE, 
arg.lis1=list(offsetRow=0.01, offsetCol=0.01)) 


# In case the interactive heatmap is not automatically opened, run the following code snippet.
# It saves the heatmap as an HTML file that is assigned to the "file" argument.

mhm <- matrix_hm(ID=c('ENSGALG00000019846', 'ENSGALG00000000112'), data=se.sub.mat, 
angleCol=80, angleRow=35, cexRow=0.8, cexCol=0.8, margin=c(8, 10), static=FALSE, 
arg.lis1=list(offsetRow=0.01, offsetCol=0.01))
htmlwidgets::saveWidget(widget=mhm, file='mhm.html', selfcontained=FALSE)
browseURL('mhm.html')


## Adjacency matrix and module identification 
adj.mod <- adj_mod(data=se.sub.mat)

# The adjacency is a measure of co-expression similarity between genes, where larger
# value denotes higher similarity.
adj.mod[['adj']][1:3, 1:3]

# The modules are identified at four sensitivity levels (ds=0, 1, 2, or 3). From 0 to 3, 
# more modules are identified but module sizes are smaller. The 4 sets of module 
# assignments are returned in a data frame, where column names are sensitivity levels. 
# The numbers in each column are module labels, where "0" means genes not assigned to 
# any module.
adj.mod[['mod']][1:3, ]

# Static network graph. Nodes are genes and edges are adjacencies between genes. 
# The thicker edge denotes higher adjacency (co-expression similarity) while larger node
# indicates higher gene connectivity (sum of a gene's adjacencies with all its direct 
# neighbors). The target gene is labeled by "_target".
network(ID="ENSGALG00000019846", data=se.sub.mat, adj.mod=adj.mod, adj.min=0, 
vertex.label.cex=1.5, vertex.cex=4, static=TRUE)

# Interactive network. The target gene ID is appended "_target".  
 network(ID="ENSGALG00000019846", data=se.sub.mat, adj.mod=adj.mod, static=FALSE) 


jianhaizhang/spatialHeatmap documentation built on April 21, 2024, 7:43 a.m.