knitr::opts_chunk$set(error=FALSE, warning=FALSE, message=FALSE)

Introduction

The r Biocpkg("MouseThymusAgeing") package provides convenient access to the single-cell RNA sequencing (scRNA-seq) datasets from @baran-gale_ageing_2020. The study used single-cell transcriptomic profiling to resolve how the epithelial composition of the mouse thymus changes with ageing. The datasets from the paper are provided as count matrices with relevant sample-level and feature-level meta-data. All data are provided post-processing and QC. The raw sequencing data can be directly acquired from ArrayExpress using accessions E-MTAB-8560 and E-MTAB-8737.

Installation

The package can be installed from Bioconductor. Bioconductor packages can be accessed using the r CRANpkg("BiocManager") package.

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("MouseThymusAgeing")

To use the package, load it in the typical way.

library(MouseThymusAgeing)

Processsing Overview

Detailed experimental protocols are available in the manuscript and analytical details are provided in the accompanying GitHub repo.

This data package contains 2 single-cell data sets from the paper. The first details the initial transcriptomic profiling of defined TEC populations using the plate-based SMART-seq2 chemistry. These cells were sorted from mice at 1, 4, 16, 32 and 52 weeks of age using the following flow cytometry phenotypes:

In each case cells were sorted from 5 separate mice at each age into a 384 well plate containing lysis buffer, with cells from different ages and days block sorted into different areas of each plate to minimise the confounding between batch effects, mouse age and sorted subpopulation. The single-cell libraries were prepared according to the SMART-seq2 protocol and sequenced on an Illumina NovaSeq 6000.

The computational processing invovled the following steps:

The second dataset contains cells that were profiling from TEC at 8, 20 and 36 weeks old, derived from a transgenic model system that is also able to lineage trace cells that derive from those that express the thymoproteasomal gene, $\beta$-5t. When this gene is expressed it drives the expression of a fluorescent reporter gene, ZsGreen (ZsG). The mouse is denoted $\mbox{3xtg}^{\beta5t}$. Each mouse (3 replicates per age) first had their transgene induced using doxycycline, and 4 weeks later the TEC were collected by flow cytometry in separate ZsG+ and ZsG- groups. Within each of these groups cells were FAC-sorted into mTEC (Cd45+EpCam+MHCII+Ly51-UEA1+) and cTEC (Cd45+EpCam+Ly51+UEA1+) populations. For this experiment we made us of recent developments in multiplexing with hashtag oligos (HTO; cell-hashing)[@stoeckius_cell_2018]. Consequently, the cells were super-loaded onto the 10X Genomics Chromium chips before library prep and sequencing on an Illumina NovaSeq 6000.

The computational processing for these data is different to above. Specifically:

Package data format

The SMART-seq2 data is stored in subsets according to the sorting day (numbered 1-5). For the droplet data, the data can be accessed according to the specific multiplexed samples (6 in total). For the SMART-seq2 the exported object SMARTseqMetadata provides the relevant metadata information for each sorting day, the equivalent object DropletMetadata contains the relevant information for each separate sample. Specific descriptions of each column can be accessed using ?SMARTseqMetadata and ?DropletMetadata.

head(SMARTseqMetadata, n = 5)

All of the data access functions allow you to select the particular samples or sorting days that you would like to access for the relevant data set. By loading only the samples or sorting days that you are interested in for your particular analysis, you will save time when downloading and loading the data, and also reduce memory consumption on your machine.

Droplet single-cell experiments tend to be much larger owing to the ability to encapsulate and process many more cells than in either 96- or 384-well plates. The droplet scRNA-seq made use of hashtag oligonucleotides to multiplex samples, allowing for replicated experimental design without breaking the bank.

head(DropletMetadata, n = 5)

Data access

Package data are provided as SingleCellExperiment objects, an extension of the Bioconductor SummarizedExperiment object for high-throughput omics experiment data. SingleCellExperiment object uses memory-efficient storage and sparse matrices to store the single-cell experiment data, whilst allowing the layering of additional feature- and cell-wise meta-data to facilitate single-cell analyses. This section will detail how to access and interact with these objects from the MouseThymusAgeing package.

smart.sce <- MouseSMARTseqData(samples="day2")
smart.sce

The gene counts are stored in the assays(sce, "counts") slot, which can be accessed using the convenience function counts. The gene counts are stored in a memory efficient sparse matrix class from the r CRANpkg("Matrix") package.

head(counts(smart.sce)[, 1:10])

The normalisation factors per cell can be accessed using the sizeFactors() function.

head(sizeFactors((smart.sce)))

These are used to normalise the data. To generate single-cell expression values on a log-normal scale, we can apply the logNormCounts from the r Biocpkg("scuttle") package. This will add the logcounts entry to the assays slot in our object.

library(scuttle)
smart.sce <- logNormCounts(smart.sce)

With these normalised counts we can perform our standard down-stream analytical tasks, such as identifying highly variable genes, projecting cells into a reduced dimensional space and clustering using a nearest-neighbour graph. You can further inspect the cell-wise meta-data attached to each dataset, stored in the colData for each r Biocpkg("SingleCellExperiment") object.

head(colData(smart.sce))

Details of what information is stored can be found in the documentation using ?DropletMetadata and ?SMARTseqMetada. In each object we also have the pre-computed reduced dimensions that can be accessed through the reducedDim(<sce>, "PCA") slot.

Session Information

sessionInfo()

References



MarioniLab/MouseThymusAgeing documentation built on Feb. 19, 2023, 11:24 a.m.