library(BiocStyle)
knitr::opts_chunk$set(error=FALSE, message=FALSE, warning=FALSE)

Downloading the count data

We obtain a single-cell RNA sequencing dataset of the mouse brain from @campbell2017molecular. Counts for endogenous genes are available from the Gene Expression Omnibus using the accession number GSE93374. We download and cache them using the r Biocpkg("BiocFileCache") package.

library(BiocFileCache)
bfc <- BiocFileCache("raw_data", ask = FALSE)
base.url <- file.path("ftp://ftp.ncbi.nlm.nih.gov/geo/series",
    "GSE93nnn/GSE93374/suppl")
count.file <- bfcrpath(bfc, file.path(base.url,
    "GSE93374_Merged_all_020816_DGE.txt.gz"))

Reading them in as a sparse matrix.

library(scater)
counts <- readSparseCounts(count.file)
dim(counts)

Downloading the metadata

We also download the cluster identities.

cluster.file <- bfcrpath(bfc, file.path(base.url,
    "GSE93374_cell_metadata.txt.gz"))
colClasses <- vector("list", 15)
colClasses[1:11] <- "character"
coldata <- read.delim(cluster.file, stringsAsFactors=FALSE, 
    colClasses=colClasses)
coldata <- as(coldata, "DataFrame")
colnames(coldata) <- sub("X[0-9]+\\.", "", colnames(coldata))
coldata

We check that the columns are in the same order.

m <- match(colnames(counts), coldata$ID)
coldata <- coldata[m,]
stopifnot(identical(colnames(counts), coldata$ID))

Saving to file

We now save all of the components to file for upload to r Biocpkg("ExperimentHub"). These will be used to construct a SingleCellExperiment on the client side when the dataset is requested.

path <- file.path("scRNAseq", "campbell-brain", "2.0.0")
dir.create(path, showWarnings=FALSE, recursive=TRUE)
saveRDS(counts, file=file.path(path, "counts.rds"))
saveRDS(coldata, file=file.path(path, "coldata.rds"))

Session information

sessionInfo()

References



drisso/scRNAseq documentation built on Feb. 16, 2021, 1:18 a.m.