library(BiocStyle) knitr::opts_chunk$set(error=FALSE, message=FALSE, warning=FALSE)
We obtain a single-cell RNA sequencing dataset of mouse oligodendrocytes from @marques2016oligodendrocyte.
Counts for endogenous genes are available from the Gene Expression Omnibus
using the accession number GSE75330.
We download and cache it using the r Biocpkg("BiocFileCache")
package.
library(BiocFileCache) bfc <- BiocFileCache("raw_data", ask = FALSE) count.file <- bfcrpath(bfc, file.path("ftp://ftp.ncbi.nlm.nih.gov/geo/series", "GSE75nnn/GSE75330/suppl/GSE75330_Marques_et_al_mol_counts2.tab.gz"))
We load them into memory.
library(scater) counts <- readSparseCounts(count.file) dim(counts)
We extract the metadata for this study using the r Biocpkg("GEOquery")
package.
library(GEOquery) coldata <- pData(getGEO("GSE75330")[[1]]) library(S4Vectors) coldata <- as(coldata, "DataFrame") rownames(coldata) <- NULL colnames(coldata)
Unfortunately, some of the cells are named differently in the coldata
compared to the counts
, which needs correction.
coldata[,1] <- sub("T[0-9]-", "-", coldata[,1]) coldata <- coldata[match(colnames(counts), coldata[,1]),] stopifnot(identical(colnames(counts), coldata[,1]))
We remove the constant columns, as these are unlikely to be interesting.
nlevels <- vapply(coldata, FUN=function(x) length(unique(x[!is.na(x)])), 1L) coldata <- coldata[,nlevels > 1L]
We also remove the columns related to the GEO accession itself, as well as some redundant fields.
coldata <- coldata[,! colnames(coldata) %in% c("geo_accession", "relation", "relation.1")] coldata <- coldata[,!grepl("^characteristics_ch", colnames(coldata))]
We convert all factors into character vectors, and we clean up the column names.
for (i in colnames(coldata)) { if (is.factor(coldata[[i]])) { coldata[[i]] <- as.character(coldata[[i]]) } } colnames(coldata) <- sub("[:_]ch1$", "", colnames(coldata)) coldata
We now save all of the components to file for upload to r Biocpkg("ExperimentHub")
.
These will be used to construct a SingleCellExperiment
on the client side when the dataset is requested.
path <- file.path("scRNAseq", "marques-brain", "2.0.0") dir.create(path, showWarnings=FALSE, recursive=TRUE) saveRDS(counts, file=file.path(path, "counts.rds")) saveRDS(coldata, file=file.path(path, "coldata.rds"))
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.