knitr::opts_chunk$set(echo = TRUE) options(bitmapType="cairo")
This is a quick walkthrough demonstrating how to generate SWNE plots alongside the Seurat pipeline using a 3k PBMC dataset as an example.
To save time we will be using the pre-computed Seurat object pbmc3k_seurat.Robj
, which can be downloaded here.
First let's load the required libraries
library(Seurat) library(swne)
Next let's load the Seurat object
obj <- readRDS("~/swne/Data/pbmc3k_final.RObj")
Most scRNA-seq pipelines only use a subset of highly overdispersed genes for analysis. We'll pull out those variable genes here, as well as the cluster labels.
## Pull out overdispersed genes as defined by Seurat var.genes <- VariableFeatures(obj) length(var.genes) ## Pull out cell clusters as defined by Seurat cell.clusters <- Idents(obj) levels(cell.clusters)
The easiest way to generate an SWNE embedding is to use the wrapper function RunSWNE
## Run SWNE genes.embed <- c("MS4A1", "GNLY", "CD3E", "CD14", "FCER1A", "FCGR3A", "LYZ", "PPBP", "CD8A") swne.embedding <- RunSWNE(obj, k = 16, genes.embed = genes.embed, sample.groups = cell.clusters) ## Plot SWNE PlotSWNE(swne.embedding, alpha.plot = 0.4, sample.groups = cell.clusters, do.label = T, label.size = 3.5, pt.size = 1, show.legend = F, seed = 42)
RunSWNE
can also return a Seurat object with SWNE set as a dimensional reduction
## Run SWNE obj <- RunSWNE(obj, k = 16, genes.embed = genes.embed, return.format = "seurat") DimPlot(obj, reduction = "swne")
Now we'll go through the SWNE embedding process step by step to get a sense of how everything works
First, let's pull out the counts, scale and adjust gene variance, while keeping the scaled matrix nonnegative.
norm.counts <- ExtractNormCounts(obj, obj.type = "seurat", rescale = T, rescale.method = "log") dim(norm.counts)
We use the FindNumFactors
function to identify the optimal number of factors to use. This function can be slow for large datasets, since it iterates over different values of k
, so a simple "hack" is to just set k
equal to the number of significant principal components.
k.range <- seq(2,16,2) ## Range of factors to iterate over k.err <- FindNumFactors(norm.counts[var.genes,], k.range = k.range, n.cores = 8, do.plot = T)
We then run the NMF decomposition. We can initialize the NMF using either Independent Component Analysis (ica), Nonnegative SVD (nnsvd), or a completely random initialization. ICA is recommended for most datasets. The output of RunNMF
is a list of the gene loadings (W) and NMF embedding (H).
k <- 16 nmf.res <- RunNMF(norm.counts[var.genes,], k = k)
We can either use the pre-computed Shared Nearest Neighbors (SNN) matrix from Seurat or re-compute it ourselves.
# pc.scores <- t(GetCellEmbeddings(se.obj, reduction.type = "pca", dims.use = 1:k)) # snn <- CalcSNN(pc.scores) obj <- FindNeighbors(obj, k = 10, prune.SNN = 1/15) snn <- as(obj@graphs$RNA_snn, "dgCMatrix")
We can prune the SNN matrix by removing edges between cells in clusters that don't share a statistically significant amount of edges using a PAGA graph (https://github.com/theislab/paga)
knn <- as(obj@graphs$RNA_nn, "dgCMatrix") ## Extract kNN matrix snn <- PruneSNN(snn, knn, clusters = cell.clusters, qval.cutoff = 1e-3)
Run the SWNE embedding. The three key parameters are alpha.exp
, snn.exp
, and n_pull
, which control how the factors and neighboring cells affect the cell coordinates.
alpha.exp <- 1.25 # Increase this > 1.0 to move the cells closer to the factors. Values > 2 start to distort the data. snn.exp <- 0.25 # Lower this < 1.0 to move similar cells closer to each other n_pull <- 3 # The number of factors pulling on each cell. Must be at least 3. swne.embedding <- EmbedSWNE(nmf.res$H, SNN = snn, alpha.exp = alpha.exp, snn.exp = snn.exp, n_pull = n_pull)
For now, let's hide the factors by setting their names to the empty string ""
. We'll interpret them later
swne.embedding$H.coords$name <- ""
To help with interpreting these cell clusters, let's pick some key PBMC genes to embed.
genes.embed <- c("MS4A1", "GNLY", "CD3E", "CD14", "FCER1A", "FCGR3A", "LYZ", "PPBP", "CD8A")
Since we only ran NMF on the overdispersed genes, we need to project the rest of the genes onto the NMF projection to get gene loadings for all genes.
nmf.res$W <- ProjectFeatures(norm.counts, nmf.res$H, n.cores = 8)
Now we can embed the key PBMC genes onto the visualization and remake the plot
swne.embedding <- EmbedFeatures(swne.embedding, nmf.res$W, genes.embed, n_pull = n_pull)
Let's make the SWNE plot with the key genes embedded. The closer a cell or a cluster is to a gene, the higher the expression level. We set a seed for reproducible cluster colors, so that every plot will use the same colors to label the clusters.
color.seed <- 42 PlotSWNE(swne.embedding, alpha.plot = 0.4, sample.groups = cell.clusters, do.label = T, label.size = 3.5, pt.size = 1, show.legend = F, seed = color.seed)
We can validate the embedded genes by overlaying the expression of one of these key genes onto the plot.
gene.use <- "CD8A" gene.expr <- norm.counts[gene.use,] FeaturePlotSWNE(swne.embedding, gene.expr, gene.use, alpha.plot = 0.4, label.size = 3.5, pt.size = 1.25)
We can also make a t-SNE plot for comparison.
obj <- RunTSNE(obj) tsne.scores <- Embeddings(obj, "tsne") PlotDims(tsne.scores, sample.groups = cell.clusters, pt.size = 1, label.size = 3.5, alpha = 0.4, show.legend = F, seed = color.seed, show.axes = F)
We can also interpret the factors by using the gene loadings matrix. Here, we extract the top 3 genes for each factor by gene loading. Since NMF tends to create a parts-based representation of the data, the factors often correspond to key biological processes or gene modules that explain the data.
gene.loadings <- nmf.res$W top.factor.genes.df <- SummarizeAssocFeatures(gene.loadings, features.return = 3) head(top.factor.genes.df)
And finally, we can make a heatmap to visualize the top factors for each gene
gene.loadings.heat <- gene.loadings[unique(top.factor.genes.df$feature),] ggHeat(gene.loadings.heat, clustering = "col")
Finally, we can extract cluster colors for compatibility with other plotting methods (i.e. Monocle)
color.mapping <- ExtractSWNEColors(swne.embedding, sample.groups = cell.clusters, seed = color.seed) color.mapping
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.