In MarioniLab/MouseGastrulationData: Single-Cell -omics Data across Mouse Gastrulation and Early Organogenesis

library(BiocStyle)
knitr::opts_chunk$set(error=FALSE, message=FALSE, warning=FALSE)

Overview

Here, we prepare a single-cell RNA sequencing timecourse dataset of mouse gastrulation and early organogenesis. Whole mouse embryos were harvested and dissociated at 6-hour timepoints between embryonic day (E) 6.5 and 8.5. Most timepoints contain several replicate samples. We will set up both the unfiltered count matrices from CellRanger (having removed swapped molecules with DropletUtils::swappedDrops) as well as a highly processed form of the data, as was used in the analyses of Pijuan-Sala et al. (2019).

Preparing the processed data

We obtain the processed count data through the r Biocpkg("BiocFileCache") framework. This caches the data locally upon the first download, avoiding the need to repeat the download on subsequent analyses.

library(BiocFileCache)
bfc <- BiocFileCache("raw_data", ask=FALSE)
count.path <- bfcrpath(bfc, file.path("https://content.cruk.cam.ac.uk/",
    "jmlab/atlas_data/raw_counts.mtx.gz"))

We load in the count data from the MatrixMarket format, using methods from the r CRANpkg("Matrix") package:

library(Matrix)
counts <- readMM(count.path)
dim(counts)

We download the cell- and gene-level metadata using r Biocpkg("BiocFileCache"), and read them into R. The UMAP coordinates are separated for later inclusion as a reducedDim item. The extensive metadata associated with specific analyses from the publication of this dataset have already been removed.

meta.path <- bfcrpath(bfc, file.path("https://content.cruk.cam.ac.uk/",
    "jmlab/atlas_data/meta.tab.gz"))
meta.tab <- read.delim(meta.path, stringsAsFactors=FALSE)
umap <- as.matrix(meta.tab[, c("umapX", "umapY")])
colnames(umap) <- c("x", "y")
meta.tab <- meta.tab[, -which(names(meta.tab) %in% c("umapX", "umapY"))]
head(meta.tab)

gene.path <- bfcrpath(bfc, file.path("https://content.cruk.cam.ac.uk/",
    "jmlab/atlas_data/genes.tsv.gz"))
gene.tab <- read.delim(gene.path, header=FALSE, stringsAsFactors=FALSE)
colnames(gene.tab) <- c("ENSEMBL", "SYMBOL")
head(gene.tab)

We store the count matrix in a SingleCellExperiment object with the metadata associated with each cell and gene.

library(SingleCellExperiment)
sce <- SingleCellExperiment(list(counts=counts), 
    colData=meta.tab, rowData=gene.tab)
sce

rm(counts)
gc()

We also obtain the size factors and store them in sce.

sf <- bfcrpath(bfc, file.path("https://content.cruk.cam.ac.uk/",
    "jmlab/atlas_data/sizefactors.tab.gz"))
sf <- read.delim(sf, header=FALSE, stringsAsFactors=FALSE)[,1]
sizeFactors(sce) <- sf
head(sf)

A 50-dimensional batch-corrected principal component representation of the data is also available. We store this in the SingleCellExperiment object. Doublets and stripped nuclei are excluded from these representations - they are represented as NAs in the reducedDim slot, so that the representation is the correct dimension to fit in the SingleCellExperiment object.

pc.path <- bfcrpath(bfc, file.path("https://content.cruk.cam.ac.uk/",
    "jmlab/atlas_data/corrected_pcas.rds"))
pc.list <- readRDS(pc.path)
#following match induces NA values deliberately
pc.all <- pc.list$all[match(colData(sce)$cell, rownames(pc.list$all)),]
rownames(pc.all) <- colData(sce)$cell
reducedDim(sce, "pca.corrected") <- pc.all
head(pc.all[,1:5])

The UMAP coordinates from the colData are also saved; similarly, this excludes doublets and stripped nuclei.

reducedDim(sce, "umap") <- umap

We now save the data, splitting the large SingleCellExperiment object into smaller, sample-wise objects. We then upload these smaller files to r Biocpkg("ExperimentHub"). Splitting up the data allows easier access of specific subsets of the data, and also allows use of the data on low-memory machines.

base <- file.path("MouseGastrulationData", "atlas", "1.0.0")
dir.create(base, recursive=TRUE, showWarnings=FALSE)
saveRDS(rowData(sce), file=paste0(base, "/rowdata.rds"))
for(samp in unique(sce$sample)){
    sub <- sce[, sce$sample == samp]
    saveRDS(counts(sub), 
        file=paste0(base, "/counts-processed-sample", samp, ".rds"))
    saveRDS(colData(sub), 
        file=paste0(base, "/coldata-sample", samp, ".rds"))
    saveRDS(sizeFactors(sub), 
        file=paste0(base, "/sizefac-sample", samp, ".rds"))
    saveRDS(reducedDims(sub), 
        file=paste0(base, "/reduced-dims-sample", samp, ".rds"))
}

Setting up the raw data

We obtain the raw count data through the r Biocpkg("BiocFileCache") framework. Each file contains the raw (unfiltered) count matrix from CellRanger for each sample, with swapped molecules removed via DropletUtils::swappedDrops.

sample.paths <- character(36)
sample.indices <- c(1:10, 12:37)
for (i in seq_along(sample.paths)) {
    fname <- sprintf("sample_%s_unswapped.mtx.gz", sample.indices[i])
    sample.paths[i] <- bfcrpath(bfc, 
        file.path("https://content.cruk.cam.ac.uk/",
       "jmlab/atlas_data/unfiltered", fname))
}
barcode.paths <- character(36)
for (i in seq_along(barcode.paths)) {
    fname <- sprintf("barcodes_%s_unswapped.tsv.gz", sample.indices[i])
    barcode.paths[i] <- bfcrpath(bfc, 
        file.path("https://content.cruk.cam.ac.uk/",
       "jmlab/atlas_data/unfiltered", fname))
}

We read in each sparse matrix and serialize it into a sample-specific RDS file.

collected <- vector("list", length(sample.paths))
for (i in seq_along(collected)) {
    curmat <- readMM(sample.paths[i])
    colnames(curmat) <- read.table(barcode.paths[i], stringsAsFactors=FALSE)[,1]
    saveRDS(curmat, file=file.path(base, sprintf("counts-raw-sample%i.rds", sample.indices[i])))
}

Note that the row-level metadata is the same in both the raw and processed data, and does not need to be re-acquired. Column names of the matrices are the 10x cell barcodes.

Session information

sessionInfo()

MarioniLab/MouseGastrulationData documentation built on Jan. 31, 2024, 11:01 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

MarioniLab/MouseGastrulationData
Single-Cell -omics Data across Mouse Gastrulation and Early Organogenesis

In MarioniLab/MouseGastrulationData: Single-Cell -omics Data across Mouse Gastrulation and Early Organogenesis

Overview

Preparing the processed data

Setting up the raw data

Session information

R Package Documentation

Browse R Packages

We want your feedback!

MarioniLab/MouseGastrulationData Single-Cell -omics Data across Mouse Gastrulation and Early Organogenesis

In MarioniLab/MouseGastrulationData: Single-Cell -omics Data across Mouse Gastrulation and Early Organogenesis

Overview

Preparing the processed data

Setting up the raw data

Session information

R Package Documentation

Browse R Packages

We want your feedback!

MarioniLab/MouseGastrulationData
Single-Cell -omics Data across Mouse Gastrulation and Early Organogenesis