knitr::opts_chunk$set(echo = TRUE, collapse = TRUE, message = FALSE, out.width="70%", out.height="70%")
To demonstrate the ability to match datasets under new analysis to relevant published datasets, we applied RAVmodel on five TCGA datasets.
suppressPackageStartupMessages({ library(GenomicSuperSignature) library(dplyr) library(ggplot2) library(magick) })
TCGA_validationDatasets
is a list containing 6 TCGA datasets: COAD, BRCA,
LUAD, READ, UCEC, and OV. First 5 are raw read counts from GSEABenchmarkeR
package with log2(count + 1) transformation. FYI, GSEABenchmarkeR::loadEData
excludes genes with cpm < 2 in more than half of the samples. TCGA-OV dataset
is from curatedOvarianData
package. These datasets were assembled using Results/TCGA/R/build_TCGA_validationDatasets.R
script.
load("data/TCGA_validationDatasets.rda") names(TCGA_validationDatasets) ## We're not using OV for this vignette datasets <- TCGA_validationDatasets[1:5]
We use RAVmodel annotated with MSigDB C2 gene sets.
## If GenomicSuperSignaturePaper is built locally with RAVmodel in inst/extdata data.dir <- system.file("extdata", package = "GenomicSuperSignaturePaper") RAVmodel <- readRDS(file.path(data.dir, "RAVmodel_C2.rds"))
RAVmodel <- getModel("C2", load=TRUE)
RAVmodel
version(RAVmodel)
We plot the heatmapTable of the validation result from multiple studies. It seems like RAV221 and RAV868 are specific to BRCA while RAV832 is strongly associated with colon/rectal cancers.
## This process takes little time due to the size of datasets. val_all <- validate(datasets, RAVmodel) heatmapTable(val_all, RAVmodel)
If you provide validation result from one dataset, heatmapTable
also returns
the average silhouette width as a reference.
### COAD val_coad <- validate(datasets[["COAD"]], RAVmodel) # heatmapTable(val_coad) ### READ val_read <- validate(datasets[["READ"]], RAVmodel) # heatmapTable(val_read)
RAV221 shows the highest validation score with the positive average silhouette width.
val_brca <- validate(datasets[["BRCA"]], RAVmodel) heatmapTable(val_brca, RAVmodel)
RAV868 doesn't ranked top 5, so we checked a couple more top validated RAVs. RAV868 is scored 6th with the negative average silhouette width.
heatmapTable(val_brca, RAVmodel, num.out = 7)
RAV221 and RAV868 are specific to BRCA.
RAV221 consists of three breast cancer studies (ERP016798, SRP023262, and SRP11343) and top 10 enriched pathways are associated with breast cancer.
ind <- 221 findStudiesInCluster(RAVmodel, ind, studyTitle = TRUE) subsetEnrichedPathways(RAVmodel, ind, include_nes = TRUE) %>% as.data.frame drawWordcloud(RAVmodel, ind)
ind <- 868 findStudiesInCluster(RAVmodel, ind, studyTitle = TRUE) subsetEnrichedPathways(RAVmodel, ind, include_nes = TRUE) %>% as.data.frame drawWordcloud(RAVmodel, ind)
Heatmap table above shows that RAV832 is specifically associated with COAD and READ datasets.
ind <- 832 findStudiesInCluster(RAVmodel, ind, studyTitle = TRUE) subsetEnrichedPathways(RAVmodel, ind, include_nes = TRUE) %>% as.data.frame drawWordcloud(RAVmodel, ind)
Here, we plot heatmap table of validation scores from five TCGA RNA sequencing,
SLE-WB microarray and four colon cancer microarray datasets. scoreCutoff
is
set to 0.68 instead of the default 0.7. The below table includes RAVs that
validate any of the ten datasets with the validation score 0.68 or higher.
Based on this multi-datasets validation table,
- RAV23 and RAV1551 are SLE-specific
- RAV188 is COAD-specific while RAV832 seems to be associated with both COAD
and READ
new_datasets <- lapply(datasets, assay) ## SLE-WB dataset project_dir <- here::here() SLEWBdir <- file.path(project_dir, "Results/SLE-WB/data") fname <- file.path(SLEWBdir, "SLE_WB_all_microarray_QN_zto_before_with_GeneSymbol.pcl") exprs <- readr::read_tsv(fname) %>% as.data.frame rownames(exprs) <- exprs$GeneSymbol sle_wb <- as.matrix(exprs[,3:ncol(exprs)]) new_datasets$SLE <- sle_wb ## CRC dataset CRCdir <- file.path(project_dir, "Results/CRC/data/eSets_new/") load(file.path(CRCdir, "GSE14095_eset.RData")) load(file.path(CRCdir, "GSE17536_eset.RData")) load(file.path(CRCdir, "GSE2109_eset.RData")) load(file.path(CRCdir, "GSE39582_eset.RData")) new_datasets$GSE14095 <- log2(exprs(GSE14095_eset)+1) new_datasets$GSE17536 <- log2(exprs(GSE17536_eset)+1) new_datasets$GSE2109 <- log2(exprs(GSE2109_eset)+1) new_datasets$GSE39582 <- log2(exprs(GSE39582_eset)+1)
new_val_all <- validate(new_datasets, RAVmodel) heatmapTable(new_val_all, RAVmodel, scoreCutoff = 0.68)
ind <- 23 findStudiesInCluster(RAVmodel, ind, studyTitle = TRUE) subsetEnrichedPathways(RAVmodel, ind, include_nes = TRUE) %>% as.data.frame drawWordcloud(RAVmodel, ind)
ind <- 1551 findStudiesInCluster(RAVmodel, ind, studyTitle = TRUE) subsetEnrichedPathways(RAVmodel, ind, include_nes = TRUE) %>% as.data.frame drawWordcloud(RAVmodel, ind)
ind <- 188 findStudiesInCluster(RAVmodel, ind, studyTitle = TRUE) subsetEnrichedPathways(RAVmodel, ind, include_nes = TRUE) %>% as.data.frame drawWordcloud(RAVmodel, ind)
A RAVmodel provides a rich resource for understanding new or user-supplied datasets in the context of all other RAVmodel datasets. A) Validation of multiple datasets. Each dataset was subjected to PCA and correlated to all possible RAVs using Pearson correlation. RAVs with correlation above 0.7 in at least one dataset were included. RAV221 and RAV868 seem to be associated with breast cancer while RAV832 is associated with colon and rectal cancer. B) Validation of a single dataset, TCGA-BRCA. Top 5 validated RAVs (score, bottom panel) and their average silhouette width (avg.sw, top panel) are shown. C) MeSH terms associated with RAV221 can aid in interpreting the biological context of the RAV when plotted as a word cloud. D) Studies contributing to RAV221. Study accession number (studyName column) and the title of a study (title column) are shown here. E) Top 10 enriched pathways in RAV221.
## Draw Fig2A/2B heatmapTable a <- grid::grid.grabExpr(ComplexHeatmap::draw(heatmapTable(val_all, RAVmodel))) b <- grid::grid.grabExpr(ComplexHeatmap::draw(heatmapTable(val_brca, RAVmodel))) ## Save Fig2c wordcloud png("Figures/manuscript_Fig2c.png", width = 350, height = 350) drawWordcloud(RAVmodel, ind = 221) dev.off() ## Import Fig2C in the way I can combine with others c <- png::readPNG("Figures/manuscript_Fig2c.png") %>% grid::rasterGrob() %>% gridExtra::grid.arrange() ## Make Fig2D/2E tables library(ggpubr) ind <- 221 fig2d <- findStudiesInCluster(RAVmodel, ind, studyTitle = TRUE) fig2e <- subsetEnrichedPathways(RAVmodel, ind, include_nes = TRUE) %>% as.data.frame ## Reformat Fig2D/2E for combining library(flextable) d <- flextable(fig2d) %>% width(., j = c(1,2,4), width = c(0.4,0.4,4)) # adjust the column width d <- grid::rasterGrob(as_raster(d)) e <- flextable(fig2e) %>% width(., j = 1, width = 5.3) e <- grid::rasterGrob(as_raster(e)) # ## If I want to add rownames # fig2e <- subsetEnrichedPathways(RAVmodel, ind) %>% # as.data.frame %>% # tibble::rownames_to_column(., " ") # e <- flextable(fig2e) %>% width(., j = 2, width = 4) # e <- grid::rasterGrob(as_raster(e)) ## Make the complete Figure 2 ac <- ggarrange(a, c, labels = c("a", "c"), vjust = c(2, 6), hjust = -1.5, nrow = 2, heights = c(0.5, 1), font.label = list(size = 15)) bde <- ggarrange(b, d, e, labels = c("b", "d", "e"), vjust = c(2,1,1), hjust = c(-5,0,0), nrow = 3, align = "v", heights = c(0.4, 0.5, 0.75), font.label = list(size = 15)) Fig2_all <- ggarrange(ac, bde, ncol = 2, widths = c(1, 1.2), align = "hv") Fig2_all ## Save the complete Figure 2 # png("Figures/manuscript_Fig2.png", width = 715, height = 520) # Fig2_all # dev.off() ggsave(filename = "Figures/manuscript_Fig2.png", plot = Fig2_all, width = 11, height = 8, units = "in", device = "png", dpi = 700)
## Save each panel as bitmapped file png("Figures/manuscript_Fig2a.png", res = 500, width = 5, height = 3, units = "in") heatmapTable(val_all, RAVmodel) dev.off() png("Figures/manuscript_Fig2b.png", res = 500, width = 5, height = 3, units = "in") heatmapTable(val_brca, RAVmodel) dev.off() png("Figures/manuscript_Fig2c.png", res = 500, width = 5, height = 5, units = "in") drawWordcloud(RAVmodel, ind = 221) dev.off() ## Save flextable objects as images save_as_image(flextable(fig2d) %>% width(., j = c(1,2,4), width = c(0.4,0.4,4)), path = "Figures/manuscript_Fig2d.png") save_as_image(flextable(fig2e) %>% width(., j = 1, width = 5.3), path = "Figures/manuscript_Fig2e.png")
library(EBImage) img <- readImage("Figures/manuscript_Fig2.png") display(img, method = "raster")
# ## Make the complete Figure 2 # ab <- ggarrange(a, b, ncol = 2, labels = c("A", "B"), vjust = 9, hjust = -7, # widths = c(1, 0.8), font.label = list(size = 15)) # cde <- ggarrange(ggarrange(c, labels = "C", vjust = 3, hjust = -7), # ggarrange(d, e, # 2D and 2E tables # labels = c("D", "E"), vjust = 1, hjust = 0, # nrow = 2, align = "hv", # heights = c(0.8, 1.4)), # ncol = 2, # widths = c(1, 1.2), # font.label = list(size = 15), # align = "hv") # Fig2_all <- ggarrange(ab, cde, nrow = 2) # ## Save the complete Figure 2 # png("Fig2.png", width = 1200, height = 1000) # Fig2_all # dev.off()
Based on Figure 2A, RAV832 seems to be associated with TCGA-COAD and TCGA-READ. Top validation results of A) TCGA-COAD and B) TCGA-READ include RAV832 with the negative average silhouette width. C) MeSH terms associated with RAV832. D) Studies contributing to RAV832. E) MSigDB C2 gene sets enriched in RAV832.
## Draw Sup.Fig4A/4B heatmapTable a <- grid::grid.grabExpr(ComplexHeatmap::draw(heatmapTable(val_coad, RAVmodel))) b <- grid::grid.grabExpr(ComplexHeatmap::draw(heatmapTable(val_read, RAVmodel))) ## Save Sup.Fig4C wordcloud png("Figures/manuscript_Sup_Fig4C.png", res = 500, width = 5, height = 5, units = "in") drawWordcloud(RAVmodel, ind = 832) dev.off() ## Import Sup.Fig4C in the way I can combine with others c <- png::readPNG("Figures/manuscript_Sup_Fig4C.png") %>% grid::rasterGrob() %>% gridExtra::grid.arrange() ## Make Sup.Fig4D/4E tables library(ggpubr) ind <- 832 fig2d <- findStudiesInCluster(RAVmodel, ind, studyTitle = TRUE) fig2e <- subsetEnrichedPathways(RAVmodel, ind, include_nes = TRUE) %>% as.data.frame ## Reformat Fig4D/4E for combining library(flextable) d <- flextable(fig2d) %>% width(., j = c(1,2,4), width = c(0.4,0.4,4)) d <- grid::rasterGrob(as_raster(d)) e <- flextable(fig2e) %>% width(., j = 1, width = 5.3) e <- grid::rasterGrob(as_raster(e))
## Make the complete Sup.Figure 4 abc <- ggarrange(a, b, c, nrow = 3, heights = c(0.7, 0.7, 1.4), labels = c("a", "b", "c"), vjust = 1.5, hjust = -2, font.label = list(size = 15, face = "bold")) de <- ggarrange(d, e, nrow = 2, heights = c(1, 1.3), labels = c("d", "e"), vjust = c(3, 6), hjust = 2.5, align = "v", font.label = list(size = 15, face = "bold")) Sup_Fig4_all <- ggarrange(abc, de, ncol = 2, align = "h") Sup_Fig4_all ## Save the complete Sup.Figure 4 png("Figures/manuscript_Sup_Fig4.png", width = 9.5, height = 6.5, units = "in", res = 500) Sup_Fig4_all dev.off() # ggsave(filename = "Figures/manuscript_Sup_Fig4.png", # plot = Sup_Fig4_all, # width = 9.5, height = 6.5, units = "in", # device = "png", dpi = 700)
library(EBImage) img <- readImage("Figures/manuscript_Sup_Fig4.png") display(img, method = "raster")
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.