library(BiocStyle)
knitr::opts_chunk$set(error=FALSE, message=FALSE, warning=FALSE)

Downloading the count data

We obtain a single-cell RNA sequencing dataset of the mouse brain from Campbell et al. (2017). Counts for endogenous genes are available from the Gene Expression Omnibus using the accession number GSE93374. We download and cache them using the r Biocpkg("BiocFileCache") package.

library(BiocFileCache)
bfc <- BiocFileCache("raw_data", ask = FALSE)
base.url <- "ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE93nnn/GSE93374/suppl/"
count.file <- bfcrpath(bfc, paste0(base.url, "GSE93374_Merged_all_020816_DGE.txt.gz"))

Reading them in as a sparse matrix.

library(scuttle)
counts <- readSparseCounts(count.file)
dim(counts)

Downloading the metadata

We also download the cluster identities.

cluster.file <- bfcrpath(bfc, paste0(base.url, "GSE93374_cell_metadata.txt.gz"))
colClasses <- vector("list", 15)
colClasses[1:11] <- "character"
coldata <- read.delim(cluster.file, stringsAsFactors=FALSE, colClasses=colClasses, row.names=1)
coldata <- as(coldata, "DataFrame")
colnames(coldata) <- sub("X[0-9]+\\.", "", colnames(coldata))
coldata

We check that the columns are in the same order.

m <- match(colnames(counts), rownames(coldata))
coldata <- coldata[m,]
stopifnot(identical(colnames(counts), rownames(coldata)))

Saving to file

We slap together a SingleCellExperiment:

library(scRNAseq)
sce <- SingleCellExperiment(list(counts=counts), colData=coldata)
sce <- polishDataset(sce)
sce

We now save all of the components to disk in preparation for upload.

meta <- list(
    title="A molecular census of arcuate hypothalamus and median eminence cell types",
    description="The hypothalamic arcuate-median eminence complex (Arc-ME) controls energy balance, fertility and growth through molecularly distinct cell types, many of which remain unknown. To catalog cell types in an unbiased way, we profiled gene expression in 20,921 individual cells in and around the adult mouse Arc-ME using Drop-seq. We identify 50 transcriptionally distinct Arc-ME cell populations, including a rare tanycyte population at the Arc-ME diffusion barrier, a new leptin-sensing neuron population, multiple agouti-related peptide (AgRP) and pro-opiomelanocortin (POMC) subtypes, and an orexigenic somatostatin neuron population. We extended Drop-seq to detect dynamic expression changes across relevant physiological perturbations, revealing cell type-specific responses to energy status, including distinct responses in AgRP and POMC neuron subtypes. Finally, integrating our data with human genome-wide association study data implicates two previously unknown neuron populations in the genetic control of obesity. This resource will accelerate biological discovery by providing insights into molecular and cell type diversity from which function can be inferred.",
    taxonomy_id="10090",
    genome="GRCm38",
    sources=list(
        list(provider="GEO", id='GSE93374'),
        list(provider="PubMed", id="28166221")
    ),
    maintainer_name="Aaron Lun",
    maintainer_email="infinite.monkeys.with.keyboards@gmail.com"
)

saveDataset(sce, "2023-12-14_output", meta)

Session information {-}

sessionInfo()


LTLA/scRNAseq documentation built on April 29, 2024, 12:34 p.m.