library(BiocStyle)
knitr::opts_chunk$set(error=FALSE, message=FALSE, warning=FALSE)

Downloading the count data

We obtain a single-cell RNA sequencing dataset of mouse oligodendrocytes from @marques2016oligodendrocyte. Counts for endogenous genes are available from the Gene Expression Omnibus using the accession number GSE75330. We download and cache it using the r Biocpkg("BiocFileCache") package.

library(BiocFileCache)
bfc <- BiocFileCache("raw_data", ask = FALSE)
count.file <- bfcrpath(bfc, file.path("ftp://ftp.ncbi.nlm.nih.gov/geo/series",
    "GSE75nnn/GSE75330/suppl/GSE75330_Marques_et_al_mol_counts2.tab.gz"))

We load them into memory.

library(scater)
counts <- readSparseCounts(count.file)
dim(counts)

Extracting the metadata

We extract the metadata for this study using the r Biocpkg("GEOquery") package.

library(GEOquery)
coldata <- pData(getGEO("GSE75330")[[1]])

library(S4Vectors)
coldata <- as(coldata, "DataFrame")
rownames(coldata) <- NULL
colnames(coldata)

Unfortunately, some of the cells are named differently in the coldata compared to the counts, which needs correction.

coldata[,1] <- sub("T[0-9]-", "-", coldata[,1])
coldata <- coldata[match(colnames(counts), coldata[,1]),]
stopifnot(identical(colnames(counts), coldata[,1]))

We remove the constant columns, as these are unlikely to be interesting.

nlevels <- vapply(coldata, FUN=function(x) length(unique(x[!is.na(x)])), 1L)
coldata <- coldata[,nlevels > 1L]

We also remove the columns related to the GEO accession itself, as well as some redundant fields.

coldata <- coldata[,! colnames(coldata) %in%
    c("geo_accession", "relation", "relation.1")]
coldata <- coldata[,!grepl("^characteristics_ch", colnames(coldata))]

We convert all factors into character vectors, and we clean up the column names.

for (i in colnames(coldata)) {
    if (is.factor(coldata[[i]])) {
        coldata[[i]] <- as.character(coldata[[i]])
    }
}
colnames(coldata) <- sub("[:_]ch1$", "", colnames(coldata))
coldata

Saving to file

We now save all of the components to file for upload to r Biocpkg("ExperimentHub"). These will be used to construct a SingleCellExperiment on the client side when the dataset is requested.

path <- file.path("scRNAseq", "marques-brain", "2.0.0")
dir.create(path, showWarnings=FALSE, recursive=TRUE)
saveRDS(counts, file=file.path(path, "counts.rds"))
saveRDS(coldata, file=file.path(path, "coldata.rds"))

Session information

sessionInfo()

References



drisso/scRNAseq documentation built on Feb. 16, 2021, 1:18 a.m.