library(BiocStyle) knitr::opts_chunk$set(error=FALSE, message=FALSE, warning=FALSE)
We obtain a single-cell RNA sequencing dataset of the mouse hypothalamus from Romanov et al. (2017).
Counts for endogenous genes are available from the Gene Expression Omnibus
using the accession number GSE74672.
We download and cache it using the r Biocpkg("BiocFileCache")
package.
library(BiocFileCache) bfc <- BiocFileCache("raw_data", ask = FALSE) base.url <- "ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE74nnn/GSE74672/suppl/" count.file <- bfcrpath(bfc, paste0(base.url, "GSE74672_expressed_mols_with_classes.xlsx.gz"))
We load them into memory.
library(R.utils) tmp.file <- tempfile(fileext=".xlsx") gunzip(count.file, destname=tmp.file, remove=FALSE) # WARNING: this requires >20 GB of memory! library(readxl) all.data <- read_xlsx(tmp.file, col_names=FALSE) dim(all.data)
The first 9 rows correspond to metadata and need to be extracted.
coldata <- all.data[1:9,] colfields <- coldata[,1,drop=TRUE] coldata <- coldata[,-1] coldata <- t(coldata) coldata <- data.frame(coldata, stringsAsFactors=FALSE) rownames(coldata) <- NULL library(S4Vectors) coldata <- DataFrame(coldata) colnames(coldata) <- colfields
A great deal of sanitization is required to obtain the desired types for each column:
for (i in c("level2 cluster number (neurons only)", "total molecules")) { coldata[,i] <- as.integer(coldata[,i]) } for (i in c("age (days postnatal)", "cell diameter")) { coldata[,i] <- as.numeric(coldata[,i]) }
We fix the sex
so that it's more informative than +/-1.
is.sex <- colnames(coldata) == "sex (female=1,male=-1)" sex.status <- coldata[,is.sex] replacement.sex <- rep('unknown', nrow(coldata)) replacement.sex[sex.status == "1"] <- "female" replacement.sex[sex.status == "-1"] <- "male" coldata[,is.sex] <- replacement.sex colnames(coldata)[is.sex] <- "sex" table(sex.status, useNA="always")
The stress indicator could also be improved to a logical vector:
is.stress <- colnames(coldata) == "acute stress (true=1)" coldata[,is.stress] <- coldata[,is.stress] == "1" colnames(coldata)[is.stress] <- "acute stress"
This gives us something a bit cleaner:
coldata
The next three rows are empty, so we just skip them.
raw.counts <- all.data[-(1:12),]
We convert the remaining counts into a matrix.
counts <- as.matrix(raw.counts[,-1]) storage.mode(counts) <- "double" rownames(counts) <- raw.counts[,1,drop=TRUE] dim(counts)
We also rename the column names:
stopifnot(!anyDuplicated(coldata$cellID)) colnames(counts) <- coldata$cellID rownames(coldata) <- coldata$cellID coldata$cellID <- NULL
Putting everything together into a SingleCellExperiment
:
library(SingleCellExperiment) sce <- SingleCellExperiment(list(counts=counts), colData=coldata)
We split off the ERCCs into an alternative experiment:
status <- ifelse(grepl("^ERCC-[0-9]+", rownames(sce)), "ERCC", "endogenous") sce <- splitAltExps(sce, status, ref="endogenous")
We apply some polish to optimize disk space:
library(scRNAseq) sce <- polishDataset(sce) sce
We now save the object to disk in preparation for upload.
meta <- list( title="Molecular interrogation of hypothalamic organization reveals distinct dopamine neuronal subtypes", description="The hypothalamus contains the highest diversity of neurons in the brain. Many of these neurons can co-release neurotransmitters and neuropeptides in a use-dependent manner. Investigators have hitherto relied on candidate protein-based tools to correlate behavioral, endocrine and gender traits with hypothalamic neuron identity. Here we map neuronal identities in the hypothalamus by single-cell RNA sequencing. We distinguished 62 neuronal subtypes producing glutamatergic, dopaminergic or GABAergic markers for synaptic neurotransmission and harboring the ability to engage in task-dependent neurotransmitter switching. We identified dopamine neurons that uniquely coexpress the Onecut3 and Nmur2 genes, and placed these in the periventricular nucleus with many synaptic afferents arising from neuromedin S+ neurons of the suprachiasmatic nucleus. These neuroendocrine dopamine cells may contribute to the dopaminergic inhibition of prolactin secretion diurnally, as their neuromedin S+ inputs originate from neurons expressing Per2 and Per3 and their tyrosine hydroxylase phosphorylation is regulated in a circadian fashion. Overall, our catalog of neuronal subclasses provides new understanding of hypothalamic organization and function.", taxonomy_id="10090", genome="GRCm38", sources=list( list(provider="GEO", id="GSE74672"), list(provider="PubMed", id="27991900") ), maintainer_name="Aaron Lun", maintainer_email="infinite.monkeys.with.keyboards@gmail.com" ) saveDataset(sce, "2023-12-19_output", meta)
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.