library(BiocStyle) knitr::opts_chunk$set(error=FALSE, message=FALSE, warning=FALSE)
We obtain a single-cell RNA sequencing dataset of human pancreas from @xin2016rna.
A matrix of RPKMs is provided in the Gene Expression Omnibus
under the accession GSE81608.
We download it using r Biocpkg("BiocFileCache")
to cache the results:
library(BiocFileCache) bfc <- BiocFileCache("raw_data", ask=FALSE) rpkm.txt <- bfcrpath(bfc, file.path("ftp://ftp.ncbi.nlm.nih.gov/geo/series", "GSE81nnn/GSE81608/suppl/GSE81608_human_islets_rpkm.txt.gz"))
We read the RPKMs into memory as a sparse matrix.
library(scater) mat <- readSparseCounts(rpkm.txt) dim(mat)
We download the metadata, which was supplied by the authors to Vladimir Kiselev, Tallulah Andrews and Martin Hemberg.
col.path <- bfcrpath(bfc, file.path("https://s3.amazonaws.com/", "scrnaseq-public-datasets/manual-data/xin", "human_islet_cell_identity.txt")) coldata <- read.delim(col.path, stringsAsFactors=FALSE, check.names=FALSE) colnames(coldata)
We check that this is consistent:
refnames <- sub(" ", "_", coldata[,1]) stopifnot(identical(colnames(mat), refnames))
We clean out uninteresting columns that have either all unique values or only one value. We hold onto the sample ID for verification purposes later.
keep <- vapply(coldata, function(x) { !length(unique(x)) %in% c(1L, length(x)) }, TRUE) coldata <- coldata[,c(1, which(keep))]
Finally, we coerce it into a DataFrame
for storage.
library(S4Vectors) coldata <- as(coldata, "DataFrame") coldata
We do the same for the row metadata, which is pretty straightforward.
row.path <- bfcrpath(bfc, file.path("https://s3.amazonaws.com/", "scrnaseq-public-datasets/manual-data/xin", "human_gene_annotation.csv")) rowdata <- read.csv(row.path, stringsAsFactors=FALSE, check.names=FALSE) rowdata <- as(rowdata, "DataFrame") stopifnot(identical(rownames(mat), as.character(rowdata[,1]))) rowdata
We now save all of the components to file for upload to r Biocpkg("ExperimentHub")
.
These will be used to construct a SingleCellExperiment
on the client side when the dataset is requested.
path <- file.path("scRNAseq", "xin-pancreas", "2.0.0") dir.create(path, showWarnings=FALSE, recursive=TRUE) saveRDS(mat, file=file.path(path, "rpkm.rds")) saveRDS(rowdata, file=file.path(path, "rowdata.rds")) saveRDS(coldata, file=file.path(path, "coldata.rds"))
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.