CSV files from the Ivy-GAP project have been assembled into a SummarizedExperiment instance.
suppressPackageStartupMessages({ library(SummarizedExperiment) library(ivygapSE) library(DT) library(grid) library(png) library(ggplot2) library(limma) library(randomForest) })
library(ivygapSE) data(ivySE) ivySE
There are several types of metadata collected with the object, including the
README.txt (use cat(metadata(ivySE)$README, sep="\n")
to see this in R),
the URL where data were retrieved,
a character vector (builder) with the R code for creating (much of) the SummarizedExperiment,
and two tables of tumor-specific and
block-specific information.
The ivyGlimpse app is a rapid prototype of a browser-based interface to salient features of the data. The most current code is maintained in the Bioconductor ivygapSE package, but a public version of the app may be visited at shinyapps.io.
The ivygapSE package will evolve, based in part on associations observed through the use of this app. Briefly, the main visualization of the app is a scatterplot of user-selected tumor image features. All contributions, based on tumor sub-blocks (that have varying multiplicities per tumor block and donor) are assembled together without regard for source; interactive aspects of the display allow the user to see which donor contributes each point.
Strata can be formed interactively by brushing over the scatterplot; after the brushing event, the survival times of donors contributing selected points are compared to donors all of whose contributions lie outside the selection. Expression data are also stratified in this way and gene-specific boxplot sets (for user-specified gene sets) are produced for each stratum.
The number of RNA-seq samples is r ncol(ivySE)
. The FPKM matrix has dimensions
dim(ivySE)
There are 42 different tumor donors.
length(unique(metadata(ivySE)$tumorDetails$donor_id))
However, only 37 donors contributed tumor RNA that was sequenced:
sum(metadata(ivySE)$tumorDetails$tumor_name %in% ivySE$tumor_name)
subd = metadata(ivySE)$subBlockDetails dsub = dim(subd)
Features of images from sub-blocks were quantified according to the following terminology for anatomical characteristics. Not all images provided information on all attributes.
nomenclat()
We have used information in the IvyGAP technical white paper to spell out additional background on the data underlying the app and SummarizedExperiment.
There are six substudies contributing data in a partly sequential design.
designOverview()
The following table has one record per tumor (N=r nrow(tumorDetails(ivySE))
).
datatable(tumorDetails(ivySE), options=list(lengthMenu=c(3,5,10,50,100)))
The following table has one record per sub-block (N=r nrow(subBlockDetails(ivySE))
).
datatable(subBlockDetails(ivySE), options=list(lengthMenu=c(3,5,10,50, 100)))
The complete annotation on RNA-seq samples is provided in colData(ivySE)
. The
table follows here:
datatable(as.data.frame(colData(ivySE)), options=list(lengthMenu=c(3,5,10,50,100)))
The sub-blocks arose from a number of measurement objectives.
sb = subBlockDetails(ivySE) table(sb$study_name)
We use the structure_acronym
variable to assess the composition of
sources in the RNA-seq collection.
struc = as.character(colData(ivySE)$structure_acronym) spls = strsplit(struc, "-") basis = vapply(spls, function(x) x[1], character(1)) spec = vapply(spls, function(x) x[2], character(1)) table(basis, exclude=NULL) barplot(table(basis))
Each of the major structural types contributes multiple samples from specific objectives.
lapply(split(spec,basis), function(x)sort(table(x),decreasing=TRUE))
We have used r Biocpkg("limma")
to test for differential
expression among samples identified as reference histology
in
classes CT
, CT-mvp
, CT-pan
, IT
, and LE
. The
resulting mean expression estimates (FPKM scale)
and moderated test statistics are obtained
as follows:
library(limma) ebout = getRefLimma()
The ten genes that are most significantly differentially expressed between conditions CT and CT-mvp are found as follows:
odig = options()$digits options(digits=3) limma::topTable(ebout, 2) options(digits=odig) # revert
We can bind the molecular subtype information from the tumor details to the expression sample annotation as follows:
moltype = tumorDetails(ivySE)$molecular_subtype names(moltype) = tumorDetails(ivySE)$tumor_name moltype[nchar(moltype)==0] = "missing" ivySE$moltype = factor(moltype[ivySE$tumor_name])
We will confine attention to samples annotated as "reference histology" and compute the duplicate correlation for modeling the effect of molecular subtype in the available samples.
library(limma) refex = ivySE[, grep("reference", ivySE$structure_acronym)] refmat = assay(refex) tydes = model.matrix(~moltype, data=as.data.frame(colData(refex))) ok = which(apply(tydes,2,sum)>0) # some subtypes don't have ref histo samples tydes = tydes[,ok] block = factor(refex$tumor_id) dd = duplicateCorrelation(refmat, tydes, block=block) f2 = lmFit(refmat, tydes, correlation=dd$consensus) ef2 = eBayes(f2) colnames(tydes) topTable(ef2,2)
We assess the capacity of the expression measures to discriminate the structural type (CT, CT-mvp, CT-pan, LE, IT) using the random forests algorithm. Features used have interquartile range (IQR) over all relevant samples exceeding the median IQR over all genes.
refex = ivySE[, grep("reference", ivySE$structure_acronym)] refex$struc = factor(refex$structure_acronym) iqrs = rowIQRs(assay(refex)) inds = which(iqrs>quantile(iqrs,.5)) set.seed(1234) rf1 = randomForest(x=t(assay(refex[inds,])), y=refex$struc, mtry=30, importance=TRUE) rf1 varImpPlot(rf1)
Patel et al. Science 2014 (344(6190): 1396–1401) present single cell RNA-seq for 430 cells from 5 tumors of different molecular subtypes. It would be interesting to use signature of structural origin to see whether intra-tumor variation can be resolved into components coherent with the five-element typology. It would also be of interest to assess whether structural type signatures are associated with any signatures of drug sensitivity in relevant cell lines.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.