if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("SingleCellMultiModal")
library(SingleCellMultiModal) library(MultiAssayExperiment) library(scran) library(scater)
This data set consists of about 10K Peripheral Blood Mononuclear Cells (PBMCs) derived from a single healthy donor. It is available from the 10x Genomics website.
Provided are the RNA expression counts quantified at the gene level and the
chromatin accessibility levels quantified at the peak level. Here we provide
the default peaks called by the CellRanger software. If you want to explore
other peak definitions or chromatin accessibility quantifications (at the
promoter level, etc.), you have download the fragments.tsv.gz
file from the
10x Genomics website.
The user can see the available dataset by using the default options
mae <- scMultiome("pbmc_10x", modes = "*", dry.run = FALSE, format = "MTX")
gg_color_hue <- function(n) { hues = seq(15, 375, length = n + 1) hcl(h = hues, l = 65, c = 100)[1:n] } colors <- gg_color_hue(length(unique(mae$celltype))) names(colors) <- unique(mae$celltype)
There are two assays: rna
and atac
, stored as
SingleCellExperiment
objects
mae
where the cells are the same in both assays:
upsetSamples(mae)
Columns:
Lymphoid
or Myeloid
originThe cells have not been QC-ed, choosing a minimum number of genes/peaks per cell depends is left to you! In addition, there are further quality control criteria that you may want to apply, including mitochondrial coverage, fraction of reads overlapping ENCODE Blacklisted regions, Transcription start site enrichment, etc. See suggestions below for software that can perform a semi-automated quality control pipeline
head(colData(mae))
The RNA expression consists of 36,549 genes and 10,032 cells, stored using
the dgCMatrix
sparse matrix format
dim(experiments(mae)[["rna"]])
names(experiments(mae))
Let's do some standard dimensionality reduction plot:
sce.rna <- experiments(mae)[["rna"]] # Normalisation sce.rna <- logNormCounts(sce.rna) # Feature selection decomp <- modelGeneVar(sce.rna) hvgs <- rownames(decomp)[decomp$mean>0.01 & decomp$p.value <= 0.05] sce.rna <- sce.rna[hvgs,] # PCA sce.rna <- runPCA(sce.rna, ncomponents = 25) # UMAP set.seed(42) sce.rna <- runUMAP(sce.rna, dimred="PCA", n_neighbors = 25, min_dist = 0.3) plotUMAP(sce.rna, colour_by="celltype", point_size=0.5, point_alpha=1)
The ATAC expression consists of 108,344 peaks and 10,032 cells:
dim(experiments(mae)[["atac"]])
Let's do some standard dimensionality reduction plot. Note that scATAC-seq data is sparser than scRNA-seq, almost binary. The log normalisation + PCA approach that scater
implements for scRNA-seq is not a good strategy for scATAC-seq data. Topic modelling or TFIDF+SVD are a better strategy. Please see the package recommendations below.
sce.atac <- experiments(mae)[["atac"]] # Normalisation sce.atac <- logNormCounts(sce.atac) # Feature selection decomp <- modelGeneVar(sce.atac) hvgs <- rownames(decomp)[decomp$mean>0.25] sce.atac <- sce.atac[hvgs,] # PCA sce.atac <- runPCA(sce.atac, ncomponents = 25) # UMAP set.seed(42) sce.atac <- runUMAP(sce.atac, dimred="PCA", n_neighbors = 25, min_dist = 0.3) plotUMAP(sce.atac, colour_by="celltype", point_size=0.5, point_alpha=1)
These are my personal recommendations of R-based analysis software:
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.