knitr::opts_chunk$set(echo = TRUE)
In this vignette, we'll demonstrate how to jointly analyze a single-cell dataset measuring both DNA accessibility and gene expression in the same cells using Signac and Seurat. In this vignette we'll be using a publicly available 10x Genomic Multiome dataset for human PBMCs.
View data download code
You can download the required data by running the following lines in a shell:
```{sh eval=FALSE} wget https://cf.10xgenomics.com/samples/cell-arc/1.0.0/pbmc_granulocyte_sorted_10k/pbmc_granulocyte_sorted_10k_filtered_feature_bc_matrix.h5 wget https://cf.10xgenomics.com/samples/cell-arc/1.0.0/pbmc_granulocyte_sorted_10k/pbmc_granulocyte_sorted_10k_atac_fragments.tsv.gz wget https://cf.10xgenomics.com/samples/cell-arc/1.0.0/pbmc_granulocyte_sorted_10k/pbmc_granulocyte_sorted_10k_atac_fragments.tsv.gz.tbi
</details> ```r if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") if (!requireNamespace("EnsDb.Hsapiens.v86", quietly = TRUE)) BiocManager::install("EnsDb.Hsapiens.v86") if (!requireNamespace("BSgenome.Hsapiens.UCSC.hg38", quietly = TRUE)) BiocManager::install("BSgenome.Hsapiens.UCSC.hg38") if (!requireNamespace("remotes", quietly = TRUE)) install.packages("remotes") if (!requireNamespace("SeuratDisk", quietly = TRUE)) remotes::install_github("mojaveazure/seurat-disk") if (!requireNamespace("glmGamPoi", quietly = TRUE)) BiocManager::install("glmGamPoi")
library(Signac) library(Seurat) library(EnsDb.Hsapiens.v86) library(BSgenome.Hsapiens.UCSC.hg38)
# load the RNA and ATAC data counts <- Read10X_h5("../vignette_data/multiomic/pbmc_granulocyte_sorted_10k_filtered_feature_bc_matrix.h5") fragpath <- "../vignette_data/multiomic/pbmc_granulocyte_sorted_10k_atac_fragments.tsv.gz"
# get gene annotations for hg38 annotation <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86) seqlevels(annotation) <- paste0('chr', seqlevels(annotation)) # create a Seurat object containing the RNA adata pbmc <- CreateSeuratObject( counts = counts$`Gene Expression`, assay = "RNA" ) # create ATAC assay and add it to the object pbmc[["ATAC"]] <- CreateChromatinAssay( counts = counts$Peaks, sep = c(":", "-"), fragments = fragpath, annotation = annotation )
pbmc
We can compute per-cell quality control metrics using the DNA accessibility data and remove cells that are outliers for these metrics, as well as cells with low or unusually high counts for either the RNA or ATAC assay.
DefaultAssay(pbmc) <- "ATAC" pbmc <- NucleosomeSignal(pbmc) pbmc <- TSSEnrichment(pbmc)
The relationship between variables stored in the object metadata can
be visualized using the DensityScatter()
function. This can also be used
to quickly find suitable cutoff values for different QC metrics by setting
quantiles=TRUE
:
DensityScatter(pbmc, x = 'nCount_ATAC', y = 'TSS.enrichment', log_x = TRUE, quantiles = TRUE)
VlnPlot( object = pbmc, features = c("nCount_RNA", "nCount_ATAC", "TSS.enrichment", "nucleosome_signal"), ncol = 4, pt.size = 0 )
# filter out low quality cells pbmc <- subset( x = pbmc, subset = nCount_ATAC < 100000 & nCount_RNA < 25000 & nCount_ATAC > 1800 & nCount_RNA > 1000 & nucleosome_signal < 2 & TSS.enrichment > 1 ) pbmc
We can normalize the gene expression data using SCTransform, and reduce the dimensionality using PCA.
DefaultAssay(pbmc) <- "RNA" pbmc <- SCTransform(pbmc) pbmc <- RunPCA(pbmc)
Here we process the DNA accessibility assay the same way we would process a scATAC-seq dataset, by performing latent semantic indexing (LSI).
DefaultAssay(pbmc) <- "ATAC" pbmc <- FindTopFeatures(pbmc, min.cutoff = 5) pbmc <- RunTFIDF(pbmc) pbmc <- RunSVD(pbmc)
To annotate cell types in the dataset we can transfer cell labels from an existing PBMC reference dataset using tools in the Seurat package. See the Seurat reference mapping vignette for more information.
We'll use an annotated PBMC reference dataset from Hao et al. (2020), available for download here: https://atlas.fredhutch.org/data/nygc/multimodal/pbmc_multimodal.h5seurat
Note that the SeuratDisk package is required to load the reference dataset. Installation instructions for SeuratDisk can be found here.
library(SeuratDisk) # load PBMC reference reference <- LoadH5Seurat("../vignette_data/multiomic/pbmc_multimodal.h5seurat", assays = list("SCT" = "counts"), reductions = 'spca') reference <- UpdateSeuratObject(reference) DefaultAssay(pbmc) <- "SCT" # transfer cell type labels from reference to query transfer_anchors <- FindTransferAnchors( reference = reference, query = pbmc, normalization.method = "SCT", reference.reduction = "spca", recompute.residuals = FALSE, dims = 1:50 ) predictions <- TransferData( anchorset = transfer_anchors, refdata = reference$celltype.l2, weight.reduction = pbmc[['pca']], dims = 1:50 ) pbmc <- AddMetaData( object = pbmc, metadata = predictions ) # set the cell identities to the cell type predictions Idents(pbmc) <- "predicted.id" # remove low-quality predictions pbmc <- pbmc[, pbmc$prediction.score.max > 0.5]
Using the weighted nearest neighbor methods in Seurat v4, we can compute a joint neighbor graph that represent both the gene expression and DNA accessibility measurements.
# build a joint neighbor graph using both assays pbmc <- FindMultiModalNeighbors( object = pbmc, reduction.list = list("pca", "lsi"), dims.list = list(1:50, 2:40), modality.weight.name = "RNA.weight", verbose = TRUE ) # build a joint UMAP visualization pbmc <- RunUMAP( object = pbmc, nn.name = "weighted.nn", assay = "RNA", verbose = TRUE ) DimPlot(pbmc, label = TRUE, repel = TRUE, reduction = "umap") + NoLegend()
For each gene, we can find the set of peaks that may regulate the gene by by computing the correlation between gene expression and accessibility at nearby peaks, and correcting for bias due to GC content, overall accessibility, and peak size. See the Signac paper for a full description of the method we use to link peaks to genes.
Running this step on the whole genome can be time consuming, so here we demonstrate
peak-gene links for a subset of genes as an example. The same function can be used
to find links for all genes by omitting the genes.use
parameter:
DefaultAssay(pbmc) <- "ATAC" # first compute the GC content for each peak pbmc <- RegionStats(pbmc, genome = BSgenome.Hsapiens.UCSC.hg38) # link peaks to genes pbmc <- LinkPeaks( object = pbmc, peak.assay = "ATAC", expression.assay = "SCT", genes.use = c("LYZ", "MS4A1") )
We can visualize these links using the CoveragePlot()
function, or alternatively
we could use the CoverageBrowser()
function in an interactive analysis:
idents.plot <- c("B naive", "B intermediate", "B memory", "CD14 Mono", "CD16 Mono", "CD8 TEM", "CD8 Naive") p1 <- CoveragePlot( object = pbmc, region = "MS4A1", features = "MS4A1", expression.assay = "SCT", idents = idents.plot, extend.upstream = 500, extend.downstream = 10000 ) p2 <- CoveragePlot( object = pbmc, region = "LYZ", features = "LYZ", expression.assay = "SCT", idents = idents.plot, extend.upstream = 8000, extend.downstream = 5000 ) patchwork::wrap_plots(p1, p2, ncol = 1)
saveRDS(object = pbmc, file = "../vignette_data/pbmc_multiomic.rds")
Session Info
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.