library(BiocStyle)
knitr::opts_chunk$set(error=FALSE, message=FALSE, warning=FALSE)

Downloading the count data

We obtain a single-cell RNA sequencing dataset of human pancreas from @xin2016rna. A matrix of RPKMs is provided in the Gene Expression Omnibus under the accession GSE81608. We download it using r Biocpkg("BiocFileCache") to cache the results:

library(BiocFileCache)
bfc <- BiocFileCache("raw_data", ask=FALSE)    
rpkm.txt <- bfcrpath(bfc, file.path("ftp://ftp.ncbi.nlm.nih.gov/geo/series",
    "GSE81nnn/GSE81608/suppl/GSE81608_human_islets_rpkm.txt.gz"))

We read the RPKMs into memory as a sparse matrix.

library(scater)
mat <- readSparseCounts(rpkm.txt)
dim(mat)

Preparing the column metadata

We download the metadata, which was supplied by the authors to Vladimir Kiselev, Tallulah Andrews and Martin Hemberg.

col.path <- bfcrpath(bfc, file.path("https://s3.amazonaws.com/",
    "scrnaseq-public-datasets/manual-data/xin",
    "human_islet_cell_identity.txt"))
coldata <- read.delim(col.path, stringsAsFactors=FALSE, check.names=FALSE)
colnames(coldata)

We check that this is consistent:

refnames <- sub(" ", "_", coldata[,1])
stopifnot(identical(colnames(mat), refnames))

We clean out uninteresting columns that have either all unique values or only one value. We hold onto the sample ID for verification purposes later.

keep <- vapply(coldata, function(x) {
    !length(unique(x)) %in% c(1L, length(x))
}, TRUE)
coldata <- coldata[,c(1, which(keep))]

Finally, we coerce it into a DataFrame for storage.

library(S4Vectors)
coldata <- as(coldata, "DataFrame")
coldata

Preparing the row metadata

We do the same for the row metadata, which is pretty straightforward.

row.path <- bfcrpath(bfc, file.path("https://s3.amazonaws.com/",
    "scrnaseq-public-datasets/manual-data/xin",
    "human_gene_annotation.csv"))
rowdata <- read.csv(row.path, stringsAsFactors=FALSE, check.names=FALSE)
rowdata <- as(rowdata, "DataFrame")
stopifnot(identical(rownames(mat), as.character(rowdata[,1])))
rowdata

Saving to file

We now save all of the components to file for upload to r Biocpkg("ExperimentHub"). These will be used to construct a SingleCellExperiment on the client side when the dataset is requested.

path <- file.path("scRNAseq", "xin-pancreas", "2.0.0")
dir.create(path, showWarnings=FALSE, recursive=TRUE)
saveRDS(mat, file=file.path(path, "rpkm.rds"))
saveRDS(rowdata, file=file.path(path, "rowdata.rds"))
saveRDS(coldata, file=file.path(path, "coldata.rds"))

Session information

sessionInfo()

References



drisso/scRNAseq documentation built on Feb. 16, 2021, 1:18 a.m.