knitr::opts_chunk$set(echo = TRUE, collapse = TRUE, message = FALSE,
                      out.width="70%", out.height="70%")

Setup

To demonstrate the ability to match datasets under new analysis to relevant published datasets, we applied RAVmodel on five TCGA datasets.

Load packages

suppressPackageStartupMessages({
  library(GenomicSuperSignature)
  library(dplyr)
  library(ggplot2)
  library(magick)
})

TCGA datasets

TCGA_validationDatasets is a list containing 6 TCGA datasets: COAD, BRCA, LUAD, READ, UCEC, and OV. First 5 are raw read counts from GSEABenchmarkeR package with log2(count + 1) transformation. FYI, GSEABenchmarkeR::loadEData excludes genes with cpm < 2 in more than half of the samples. TCGA-OV dataset is from curatedOvarianData package. These datasets were assembled using Results/TCGA/R/build_TCGA_validationDatasets.R script.

load("data/TCGA_validationDatasets.rda")
names(TCGA_validationDatasets)

## We're not using OV for this vignette
datasets <- TCGA_validationDatasets[1:5]

RAVmodel

We use RAVmodel annotated with MSigDB C2 gene sets.

## If GenomicSuperSignaturePaper is built locally with RAVmodel in inst/extdata
data.dir <- system.file("extdata", package = "GenomicSuperSignaturePaper")
RAVmodel <- readRDS(file.path(data.dir, "RAVmodel_C2.rds"))
RAVmodel <- getModel("C2", load=TRUE)
RAVmodel

version(RAVmodel)

Comparison Ver.1 (TCGA only)

heatmapTable (all)

We plot the heatmapTable of the validation result from multiple studies. It seems like RAV221 and RAV868 are specific to BRCA while RAV832 is strongly associated with colon/rectal cancers.

## This process takes little time due to the size of datasets.
val_all <- validate(datasets, RAVmodel)
heatmapTable(val_all, RAVmodel) 

heatmapTable (one dataset)

If you provide validation result from one dataset, heatmapTable also returns the average silhouette width as a reference.

### COAD
val_coad <- validate(datasets[["COAD"]], RAVmodel)
# heatmapTable(val_coad) 

### READ
val_read <- validate(datasets[["READ"]], RAVmodel)
# heatmapTable(val_read) 

BRCA

RAV221 shows the highest validation score with the positive average silhouette width.

val_brca <- validate(datasets[["BRCA"]], RAVmodel)
heatmapTable(val_brca, RAVmodel)

RAV868 doesn't ranked top 5, so we checked a couple more top validated RAVs. RAV868 is scored 6th with the negative average silhouette width.

heatmapTable(val_brca, RAVmodel, num.out = 7) 

MeSH terms and associated studies

BRCA-associated

RAV221 and RAV868 are specific to BRCA.

RAV221

RAV221 consists of three breast cancer studies (ERP016798, SRP023262, and SRP11343) and top 10 enriched pathways are associated with breast cancer.

ind <- 221
findStudiesInCluster(RAVmodel, ind, studyTitle = TRUE)

subsetEnrichedPathways(RAVmodel, ind, include_nes = TRUE) %>% as.data.frame

drawWordcloud(RAVmodel, ind)

RAV868

ind <- 868
findStudiesInCluster(RAVmodel, ind, studyTitle = TRUE)

subsetEnrichedPathways(RAVmodel, ind, include_nes = TRUE) %>% as.data.frame

drawWordcloud(RAVmodel, ind)

COAD/READ-associated

Heatmap table above shows that RAV832 is specifically associated with COAD and READ datasets.

RAV832

ind <- 832
findStudiesInCluster(RAVmodel, ind, studyTitle = TRUE)

subsetEnrichedPathways(RAVmodel, ind, include_nes = TRUE) %>% as.data.frame

drawWordcloud(RAVmodel, ind)

Comparison Ver.2 (TCGA + others)

heatmapTable (all)

Here, we plot heatmap table of validation scores from five TCGA RNA sequencing, SLE-WB microarray and four colon cancer microarray datasets. scoreCutoff is set to 0.68 instead of the default 0.7. The below table includes RAVs that validate any of the ten datasets with the validation score 0.68 or higher.

Based on this multi-datasets validation table,
- RAV23 and RAV1551 are SLE-specific
- RAV188 is COAD-specific while RAV832 seems to be associated with both COAD and READ

new_datasets <- lapply(datasets, assay)

## SLE-WB dataset
project_dir <- here::here()
SLEWBdir <- file.path(project_dir, "Results/SLE-WB/data")
fname <- file.path(SLEWBdir, "SLE_WB_all_microarray_QN_zto_before_with_GeneSymbol.pcl")
exprs <- readr::read_tsv(fname) %>% as.data.frame
rownames(exprs) <- exprs$GeneSymbol
sle_wb <- as.matrix(exprs[,3:ncol(exprs)])
new_datasets$SLE <- sle_wb

## CRC dataset
CRCdir <- file.path(project_dir, "Results/CRC/data/eSets_new/")
load(file.path(CRCdir, "GSE14095_eset.RData"))
load(file.path(CRCdir, "GSE17536_eset.RData"))
load(file.path(CRCdir, "GSE2109_eset.RData"))
load(file.path(CRCdir, "GSE39582_eset.RData"))
new_datasets$GSE14095 <- log2(exprs(GSE14095_eset)+1)
new_datasets$GSE17536 <- log2(exprs(GSE17536_eset)+1)
new_datasets$GSE2109 <- log2(exprs(GSE2109_eset)+1)
new_datasets$GSE39582 <- log2(exprs(GSE39582_eset)+1)
new_val_all <- validate(new_datasets, RAVmodel)
heatmapTable(new_val_all, RAVmodel, scoreCutoff = 0.68) 

MeSH terms and associated studies

SLE-associated

RAV23

ind <- 23
findStudiesInCluster(RAVmodel, ind, studyTitle = TRUE)

subsetEnrichedPathways(RAVmodel, ind, include_nes = TRUE) %>% as.data.frame

drawWordcloud(RAVmodel, ind)

RAV1551

ind <- 1551
findStudiesInCluster(RAVmodel, ind, studyTitle = TRUE)

subsetEnrichedPathways(RAVmodel, ind, include_nes = TRUE) %>% as.data.frame

drawWordcloud(RAVmodel, ind)

COAD-specific

ind <- 188
findStudiesInCluster(RAVmodel, ind, studyTitle = TRUE)

subsetEnrichedPathways(RAVmodel, ind, include_nes = TRUE) %>% as.data.frame

drawWordcloud(RAVmodel, ind)

Manuscript Figures

Fig 2. Connect new data with the existing databases

A RAVmodel provides a rich resource for understanding new or user-supplied datasets in the context of all other RAVmodel datasets. A) Validation of multiple datasets. Each dataset was subjected to PCA and correlated to all possible RAVs using Pearson correlation. RAVs with correlation above 0.7 in at least one dataset were included. RAV221 and RAV868 seem to be associated with breast cancer while RAV832 is associated with colon and rectal cancer. B) Validation of a single dataset, TCGA-BRCA. Top 5 validated RAVs (score, bottom panel) and their average silhouette width (avg.sw, top panel) are shown. C) MeSH terms associated with RAV221 can aid in interpreting the biological context of the RAV when plotted as a word cloud. D) Studies contributing to RAV221. Study accession number (studyName column) and the title of a study (title column) are shown here. E) Top 10 enriched pathways in RAV221.


## Draw Fig2A/2B heatmapTable 
a <- grid::grid.grabExpr(ComplexHeatmap::draw(heatmapTable(val_all, RAVmodel)))
b <- grid::grid.grabExpr(ComplexHeatmap::draw(heatmapTable(val_brca, RAVmodel)))

## Save Fig2c wordcloud
png("Figures/manuscript_Fig2c.png", width = 350, height = 350)
drawWordcloud(RAVmodel, ind = 221)
dev.off()

## Import Fig2C in the way I can combine with others
c <- png::readPNG("Figures/manuscript_Fig2c.png") %>% 
  grid::rasterGrob() %>% 
  gridExtra::grid.arrange()

## Make Fig2D/2E tables
library(ggpubr)
ind <- 221
fig2d <- findStudiesInCluster(RAVmodel, ind, studyTitle = TRUE)
fig2e <- subsetEnrichedPathways(RAVmodel, ind, include_nes = TRUE) %>% 
    as.data.frame

## Reformat Fig2D/2E for combining
library(flextable)
d <- flextable(fig2d) %>% width(., j = c(1,2,4), width = c(0.4,0.4,4)) # adjust the column width
d <- grid::rasterGrob(as_raster(d))

e <- flextable(fig2e) %>% width(., j = 1, width = 5.3) 
e <- grid::rasterGrob(as_raster(e))

# ## If I want to add rownames
# fig2e <- subsetEnrichedPathways(RAVmodel, ind) %>% 
#   as.data.frame %>% 
#   tibble::rownames_to_column(., " ")
# e <- flextable(fig2e) %>% width(., j = 2, width = 4) 
# e <- grid::rasterGrob(as_raster(e))

## Make the complete Figure 2
ac <- ggarrange(a, c, 
                labels = c("a", "c"), vjust = c(2, 6), hjust = -1.5,
                nrow = 2, 
                heights = c(0.5, 1),
                font.label = list(size = 15))
bde <- ggarrange(b, d, e, 
                 labels = c("b", "d", "e"), vjust = c(2,1,1), hjust = c(-5,0,0),
                 nrow = 3, align = "v",
                 heights = c(0.4, 0.5, 0.75),
                 font.label = list(size = 15))

Fig2_all <- ggarrange(ac, bde, ncol = 2, widths = c(1, 1.2), align = "hv") 
Fig2_all

## Save the complete Figure 2
# png("Figures/manuscript_Fig2.png", width = 715, height = 520)
# Fig2_all
# dev.off()

ggsave(filename = "Figures/manuscript_Fig2.png", 
       plot = Fig2_all,
       width = 11, height = 8, units = "in",
       device = "png", dpi = 700)
## Save each panel as bitmapped file
png("Figures/manuscript_Fig2a.png", res = 500, width = 5, height = 3, units = "in")
heatmapTable(val_all, RAVmodel)
dev.off()
png("Figures/manuscript_Fig2b.png", res = 500, width = 5, height = 3, units = "in")
heatmapTable(val_brca, RAVmodel)
dev.off()
png("Figures/manuscript_Fig2c.png", res = 500, width = 5, height = 5, units = "in")
drawWordcloud(RAVmodel, ind = 221)
dev.off()

## Save flextable objects as images
save_as_image(flextable(fig2d) %>% width(., j = c(1,2,4), width = c(0.4,0.4,4)),
              path = "Figures/manuscript_Fig2d.png")
save_as_image(flextable(fig2e) %>% width(., j = 1, width = 5.3),
              path = "Figures/manuscript_Fig2e.png")
library(EBImage)
img <- readImage("Figures/manuscript_Fig2.png")
display(img, method = "raster")
# ## Make the complete Figure 2
# ab <- ggarrange(a, b, ncol = 2, labels = c("A", "B"), vjust = 9, hjust = -7,
#                 widths = c(1, 0.8), font.label = list(size = 15))
# cde <- ggarrange(ggarrange(c, labels = "C", vjust = 3, hjust = -7),
#                  ggarrange(d, e,  # 2D and 2E tables  
#                            labels = c("D", "E"), vjust = 1, hjust = 0,
#                            nrow = 2, align = "hv",
#                            heights = c(0.8, 1.4)), 
#                  ncol = 2,
#                  widths = c(1, 1.2),
#                  font.label = list(size = 15),
#                  align = "hv")
# Fig2_all <- ggarrange(ab, cde, nrow = 2)   

# ## Save the complete Figure 2
# png("Fig2.png", width = 1200, height = 1000)
# Fig2_all
# dev.off()

Sup.Fig 4. Colon and rectal cancer associated RAV

Based on Figure 2A, RAV832 seems to be associated with TCGA-COAD and TCGA-READ. Top validation results of A) TCGA-COAD and B) TCGA-READ include RAV832 with the negative average silhouette width. C) MeSH terms associated with RAV832. D) Studies contributing to RAV832. E) MSigDB C2 gene sets enriched in RAV832.


## Draw Sup.Fig4A/4B heatmapTable 
a <- grid::grid.grabExpr(ComplexHeatmap::draw(heatmapTable(val_coad, RAVmodel)))
b <- grid::grid.grabExpr(ComplexHeatmap::draw(heatmapTable(val_read, RAVmodel)))

## Save Sup.Fig4C wordcloud
png("Figures/manuscript_Sup_Fig4C.png", res = 500, width = 5, height = 5, units = "in")
drawWordcloud(RAVmodel, ind = 832)
dev.off()

## Import Sup.Fig4C in the way I can combine with others
c <- png::readPNG("Figures/manuscript_Sup_Fig4C.png") %>% 
  grid::rasterGrob() %>% 
  gridExtra::grid.arrange()

## Make Sup.Fig4D/4E tables
library(ggpubr)
ind <- 832
fig2d <- findStudiesInCluster(RAVmodel, ind, studyTitle = TRUE)
fig2e <- subsetEnrichedPathways(RAVmodel, ind, include_nes = TRUE) %>% 
    as.data.frame

## Reformat Fig4D/4E for combining
library(flextable)
d <- flextable(fig2d) %>% width(., j = c(1,2,4), width = c(0.4,0.4,4)) 
d <- grid::rasterGrob(as_raster(d))

e <- flextable(fig2e) %>% width(., j = 1, width = 5.3) 
e <- grid::rasterGrob(as_raster(e))
## Make the complete Sup.Figure 4
abc <- ggarrange(a, b, c, 
                 nrow = 3, heights = c(0.7, 0.7, 1.4), 
                 labels = c("a", "b", "c"), vjust = 1.5, hjust = -2,
                 font.label = list(size = 15, face = "bold"))
de <- ggarrange(d, e, 
                nrow = 2, heights = c(1, 1.3),
                labels = c("d", "e"), vjust = c(3, 6), hjust = 2.5,
                align = "v",
                font.label = list(size = 15, face = "bold"))

Sup_Fig4_all <- ggarrange(abc, de, ncol = 2, align = "h")
Sup_Fig4_all

## Save the complete Sup.Figure 4
png("Figures/manuscript_Sup_Fig4.png", width = 9.5, height = 6.5, units = "in", res = 500)
Sup_Fig4_all
dev.off()

# ggsave(filename = "Figures/manuscript_Sup_Fig4.png", 
#        plot = Sup_Fig4_all,
#        width = 9.5, height = 6.5, units = "in",
#        device = "png", dpi = 700)
library(EBImage)
img <- readImage("Figures/manuscript_Sup_Fig4.png")
display(img, method = "raster")

Session Info

sessionInfo()



shbrief/GenomicSuperSignaturePaper documentation built on Aug. 2, 2022, 2:04 p.m.