suppressPackageStartupMessages({
library(BiocStyle)
library(htxapp)
library(restfulSE)
library(HumanTranscriptomeCompendium)
library(ggplot2)
library(table1)
})

Introduction

We used the app at https://vjcitn.shinyapps.io/cancer9k to extract SRA study SRP069212. This dataset was contributed to SRA in conjunction with "Recurrently deregulated lncRNAs in hepatocellular carcinoma", a Nature Communications paper (@Yang2017 PMID 28194035). The data are saved as pairedHCC.rds. Reads were requantified using salmon (by Sean Davis of NCI) and moved into the /shared/bioconductor folder of the HDF Scalable Data Service at http://hsdshdflab.hdfgroup.org.

We saved the dataset using the cancer9k app download facility, to a local file pairedHCC.rds. We'll use this to understand aspects of the design of the study, and to replicate aspects of the published analysis.

library(restfulSE)
hccdat = readRDS("pairedHCC.rds")
hccdat

Data exploration

The retrieved SummarizedExperiment instance has a colData with only four columns. The htxapp adds sample attribute information retrieved with SRAdbV2.

dplyr::glimpse(metadata(hccdat)$sampleAttsSRP069212)

We add this table to the colData using the bindSingleMetadata function of the htxapp package.

library(htxapp)
hccdat = bindSingleMetadata(hccdat)

Sample characteristics

Now the r CRANpkg("table1") package can be applied to colData to get an overview of sample characteristics. Note that here the n= entries DO NOT refer to independent observations.

table1(~source_name+hbv.infection+cirrhosis+age|gender,data=colData(hccdat))

We can see that this study involves up to three assays per participant (tumor, adjacent normal, portal vein tumor thrombosis). Fractions of individuals with cirrhosis and HBV infection are also noteworthy.

Assay metadata

We use the function addRD from HumanTranscriptomeCompendium to add rowData derived from Gencode 27.

hccdat = addRD(hccdat)
dplyr::glimpse(as.data.frame(rowData(hccdat)))

Assay

The assay quantifications are accessed via the r Biocpkg("DelayedArray") protocol.

assay(hccdat)

For targeted inquiries, we do not need to 'download' the complete expression data. Here we will check on LINC01018.

feature_trace("LINC01018", hccdat) + xlab(" ") + guides(colour="none")

This display uses quantifications distinct from those used by the authors. The picture is consistent with notion that the distribution of abundance of LINC01018 differs between samples from tumor and adjacent normal tissue.

Further work

The full assay data can be retrieved and loaded into memory using m <- as.matrix(assay(hccdat)); assay(hccdat) = m. This operation took a little more than a minute on a mediocre network. Speed of retrieval will also depend on load at the HDF Scalable Data Service.

References



vjcitn/htxapp documentation built on May 20, 2019, 12:37 p.m.