@Mertins2016 unites information developed in the NCI CPTAC with TCGA data. The brcaProteo package reviews approaches to interrogating this data with Bioconductor.
The MultiAssayExperiment instance brProteoMAE
has two components
that use DelayedArray assay representation. The assay data can
be retrieved using assay
when the environment variable CGC_BILLING
is set
to a valid Google Cloud Platform project id.
library(brcaProteo) data(brProteoMAE) brProteoMAE
The proteomic results are available for a subset of the TCGA contributions.
upsetSamples(brProteoMAE)
We can see that there are 76 samples contributing to three assays.
We would like to line up samples and features that are common across assay types, in the simplest possible way.
We will relabel the rows of the RNA-seq data, which are given as Entrez identifiers. The other two assays use gene symbols.
We'll start out by isolating the relevant SummarizedExperiment and forcing an authentication, by querying for assay values.
library(restfulSE) rna = experiments(brProteoMAE)[["rnaseq"]] rna assay(rna[1:3,1:4])
Now we use the orgDb interface to obtain gene symbols.
library(org.Hs.eg.db) id = mapIds(org.Hs.eg.db, keys=rownames(rna), column="SYMBOL", keytype="ENTREZID") id[is.na(id)] = rownames(rna)[is.na(id)] rownames(rna) = id # if no symbol, just retain entrez id
The proteomic data can have multiple feature values per gene; for this initial view we'll omit all but the first when there are multiplicities.
rd = rowData(experiments(brProteoMAE)[["itraq"]]) ok = which(!is.na(rd$geneName)&!duplicated(rd$geneName)) ph = experiments(brProteoMAE)[["itraq"]][ok,] rownames(ph) = rowData(ph)$geneName
The MultiAssayExperiment with common samples and features is then produced as follows:
slot(brProteoMAE, "ExperimentList")[["itraq"]] = ph slot(brProteoMAE, "ExperimentList")[["rnaseq"]] = rna brint = intersectColumns(intersectRows(brProteoMAE))
We'll materialize the assay components for RPPA and RNA-seq, taking logs of RNA-seq values:
assay(experiments(brint)[["rnaseq"]]) = log(as.matrix(assay(experiments(brint)[["rnaseq"]]))+1) assay(experiments(brint)[["rppa"]]) = as.matrix(assay(experiments(brint)[["rppa"]])) brint
We extract a matrix for comparison of assay values over samples, focusing here on gene BCL2.
mt = sapply(experiments(brint["BCL2",]), assay) pairs(mt)
For all the selected, coincident features, interassay correlation is computed as follows:
fixup = function(x) data.matrix(na.omit(data.frame(x))) corm = function(x) cor(fixup(sapply(experiments(brint[x,]), assay))) allc = sapply(rownames(brint)[[1]], corm) dim(allc) summary(allc[2,]) # itraq vs rppa summary(allc[3,]) # itraq vs rnaseq summary(allc[8,]) # rppa vs rnaseq
The distributions of correlations can be searched to find assays that are not highly concordant across samples. The pairs plot for BRCA2 is
mt = sapply(experiments(brint["BRCA2",]), assay) pairs(mt)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.