library(BiocStyle)
knitr::opts_chunk$set(error=FALSE, message=FALSE, warning=FALSE)

Downloading the data

We obtain a single-cell RNA sequencing dataset of human cerebral cortex cells from Darmanis et al. (2015). Counts for endogenous genes are available from the Gene Expression Omnibus using the accession number GSE67835. We download and cache them using the r Biocpkg("BiocFileCache") package.

library(BiocFileCache)
bfc <- BiocFileCache("raw_data", ask = FALSE)
tarball <- bfcrpath(bfc, "https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE67835&format=file")

fname <- tempfile()
untar(tarball, exdir=fname)
all.files <- list.files(fname, full=TRUE)
length(all.files)

Reading all the counts into memory and combining them:

counts <- lapply(all.files, read.table, sep="\t", header=FALSE, row.names=1)

# Sanity check:
gene.ids <- lapply(counts, rownames)
stopifnot(length(unique(gene.ids))==1L)

combined <- as.matrix(do.call(cbind, counts))
rownames(combined) <- sub(" +$", "", rownames(combined))
colnames(combined) <- sub("_.*", "\\1", basename(all.files))
str(combined)

Let's make an SCE; some reorganization is required to move the alignment metrics from the assays into the column data.

library(SingleCellExperiment)
is.metric <- rownames(combined) %in% c("no_feature", "ambiguous", "alignment_not_unique")
sce <- SingleCellExperiment(list(counts=combined[!is.metric,]))
colData(sce)$metrics <- DataFrame(t(combined[is.metric,]))
sce$metrics

Fetching metadata

We pull down some metadata from GEO. This requires some cleaning to get rid of the useless bits of information.

library(GEOquery)
out <- GEOquery::getGEO("GSE67835")

df <- rbind(
    as(phenoData(out[[1]]), "data.frame"),
    as(phenoData(out[[2]]), "data.frame")
) 
colnames(df)

We remove fields that are either too specific or too general.

keep <- vapply(df, function(x) {
    n <- length(unique(x))
    n > 1 & n < length(x)
}, TRUE)
df <- df[,keep]
colnames(df)

We remove duplicated characteristics fields:

keep <- grep(":ch1$", colnames(df))
df <- df[,keep]
colnames(df)

And finally we sanitize the names:

library(S4Vectors)
colnames(df) <- sub(":ch1$", "", colnames(df))
df <- DataFrame(df)

Now we just need to reorganize the order so that it matches the count matrix.

stopifnot(anyDuplicated(rownames(df))==0)
stopifnot(identical(sort(rownames(df)), sort(colnames(combined))))
df <- df[colnames(combined),] 

Injecting it into the SCE:

colData(sce) <- cbind(colData(sce), df)
colData(sce)

Saving to file

We run some polishing to optimize for space:

library(scRNAseq)
sce <- polishDataset(sce)
sce

We then save it to disk in preparation for upload:

meta <- list(
    title="A survey of human brain transcriptome diversity at the single cell level",
    description="The human brain is a tissue of vast complexity in terms of the cell types it comprises. Conventional approaches to classifying cell types in the human brain at single cell resolution have been limited to exploring relatively few markers and therefore have provided a limited molecular characterization of any given cell type. We used single cell RNA sequencing on 466 cells to capture the cellular complexity of the adult and fetal human brain at a whole transcriptome level. Healthy adult temporal lobe tissue was obtained during surgical procedures where otherwise normal tissue was removed to gain access to deeper hippocampal pathology in patients with medical refractory seizures. We were able to classify individual cells into all of the major neuronal, glial, and vascular cell types in the brain. We were able to divide neurons into individual communities and show that these communities preserve the categorization of interneuron subtypes that is typically observed with the use of classic interneuron markers. We then used single cell RNA sequencing on fetal human cortical neurons to identify genes that are differentially expressed between fetal and adult neurons and those genes that display an expression gradient that reflects the transition between replicating and quiescent fetal neuronal populations. Finally, we observed the expression of major histocompatibility complex type I genes in a subset of adult neurons, but not fetal neurons. The work presented here demonstrates the applicability of single cell RNA sequencing on the study of the adult human brain and constitutes a first step toward a comprehensive cellular atlas of the human brain.",
    taxonomy_id="9606",
    genome="GRCh37",
    sources=list(
        list(provider="GEO", id="GSE67835"),
        list(provider="PubMed", id="26060301")
    ),
    maintainer_name="Aaron Lun",
    maintainer_email="infinite.monkeys.with.keyboards@gmail.com"
)

saveDataset(sce, "2023-12-21_output", meta)

Session information {-}

sessionInfo()


LTLA/scRNAseq documentation built on April 29, 2024, 12:34 p.m.