suppressPackageStartupMessages({ library(BiocStyle) library(htxapp) library(restfulSE) library(HumanTranscriptomeCompendium) library(ggplot2) library(table1) })
We used the app at https://vjcitn.shinyapps.io/cancer9k
to extract SRA study SRP069212. This dataset was
contributed to SRA in conjunction with
"Recurrently deregulated lncRNAs in hepatocellular
carcinoma", a Nature Communications paper
(@Yang2017 PMID 28194035).
The data are saved as pairedHCC.rds
. Reads were requantified
using salmon (by Sean Davis of NCI) and moved into the
/shared/bioconductor
folder of the HDF Scalable Data Service
at http://hsdshdflab.hdfgroup.org
.
We saved the dataset using the cancer9k app download facility,
to a local file pairedHCC.rds
. We'll use this to understand
aspects of the design of the study, and to replicate aspects
of the published analysis.
library(restfulSE) hccdat = readRDS("pairedHCC.rds") hccdat
The retrieved SummarizedExperiment instance has a colData
with only
four columns. The htxapp adds sample attribute information retrieved with SRAdbV2.
dplyr::glimpse(metadata(hccdat)$sampleAttsSRP069212)
We add this table to the colData using the bindSingleMetadata
function
of the htxapp
package.
library(htxapp) hccdat = bindSingleMetadata(hccdat)
Now the r CRANpkg("table1")
package can be applied to colData
to get an overview of sample characteristics.
Note that here the n=
entries DO NOT refer to independent observations.
table1(~source_name+hbv.infection+cirrhosis+age|gender,data=colData(hccdat))
We can see that this study involves up to three assays per participant (tumor, adjacent normal, portal vein tumor thrombosis). Fractions of individuals with cirrhosis and HBV infection are also noteworthy.
We use the function addRD
from HumanTranscriptomeCompendium to
add rowData derived from Gencode 27.
hccdat = addRD(hccdat) dplyr::glimpse(as.data.frame(rowData(hccdat)))
The assay quantifications are accessed via the r Biocpkg("DelayedArray")
protocol.
assay(hccdat)
For targeted inquiries, we do not need to 'download' the complete expression data. Here we will check on LINC01018.
feature_trace("LINC01018", hccdat) + xlab(" ") + guides(colour="none")
This display uses quantifications distinct from those used by the authors. The picture is consistent with notion that the distribution of abundance of LINC01018 differs between samples from tumor and adjacent normal tissue.
The full assay data can be retrieved and loaded into memory
using m <- as.matrix(assay(hccdat)); assay(hccdat) = m
. This
operation took a little more than a minute on a mediocre network.
Speed of retrieval will also depend on load at the HDF Scalable
Data Service.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.