library(BiocStyle)
knitr::opts_chunk$set(error=FALSE, message=FALSE, warning=FALSE)

Downloading the count data

We obtain a single-cell RNA sequencing dataset of human pancreas from Xin et al. (2016). A matrix of RPKMs is provided in the Gene Expression Omnibus under the accession GSE81608. We download it using r Biocpkg("BiocFileCache") to cache the results:

library(BiocFileCache)
bfc <- BiocFileCache("raw_data", ask=FALSE)    
rpkm.txt <- bfcrpath(bfc, "ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE81nnn/GSE81608/suppl/GSE81608_human_islets_rpkm.txt.gz")

We read the RPKMs into memory as a sparse matrix.

library(scuttle)
mat <- readSparseCounts(rpkm.txt)
dim(mat)

Preparing the row/column annotations

We download the metadata, which was supplied by the authors to Vladimir Kiselev, Tallulah Andrews and Martin Hemberg. Annoyingly, our original source of this file is no longer available, so we'll have to load the copy from ExperimentHub.

library(ExperimentHub)
ehub <- ExperimentHub()
coldata <- ehub[["EH2700"]]

transformed <- sub("_", " ", colnames(mat))
stopifnot(identical(transformed, coldata$Sample.name)) # check consistency
coldata$Sample.name <- NULL # mostly redundant and removed.
coldata

We do the same for the row metadata.

rowdata <- ehub[["EH2699"]]
stopifnot(identical(rownames(mat), as.character(rowdata[,1])))
rowdata <- rowdata[,-1,drop=FALSE] # redundant and removed.
rowdata

Saving to file

Slapping everything together into a SingleCellExperiment:

library(SingleCellExperiment)
sce <- SingleCellExperiment(list(rpkm=mat), colData=coldata, rowData=rowdata)

Adding some polish to optimize for disk space:

library(scRNAseq)
sce <- polishDataset(sce)
sce

Now saving it to disk:

meta <- list(
    title="RNA Sequencing of Single Human Islet Cells Reveals Type 2 Diabetes Genes",
    description="Pancreatic islet cells are critical for maintaining normal blood glucose levels, and their malfunction underlies diabetes development and progression. We used single-cell RNA sequencing to determine the transcriptomes of 1,492 human pancreatic α, β, δ, and PP cells from non-diabetic and type 2 diabetes organ donors. We identified cell-type-specific genes and pathways as well as 245 genes with disturbed expression in type 2 diabetes. Importantly, 92% of the genes have not previously been associated with islet cell function or growth. Comparison of gene profiles in mouse and human α and β cells revealed species-specific expression. All data are available for online browsing and download and will hopefully serve as a resource for the islet research community.",
    taxonomy_id="9606",
    genome="GRCh37",
    sources=list(
        list(provider="GEO", id="GSE81608"),
        list(provider="PubMed", id="27667665"),
        list(provider="ExperimentHub", id="EH2700"),
        list(provider="ExperimentHub", id="EH2699")
    ),
    maintainer_name="Aaron Lun",
    maintainer_email="infinite.monkeys.with.keyboards@gmail.com"
)

saveDataset(sce, "2023-12-19_output", meta)

Session information {-}

sessionInfo()


LTLA/scRNAseq documentation built on April 29, 2024, 12:34 p.m.