In BlishLab/scriabin: Single-cell resolved interaction analysis through binning

Here we will illustrate the third Scriabin workflow:

Load libraries

library(Seurat)
library(SeuratData)
library(scriabin)
library(tidyverse)
library(ComplexHeatmap)
library(cowplot)
library(magrittr)

We will find To install the panc8 dataset:

if (!requireNamespace("panc8.SeuratData", quietly = TRUE))
  install.packages("https://seurat.nygenome.org/src/contrib/panc8.SeuratData_3.0.2.tar.gz", repos = NULL, type = "source") 
library(panc8.SeuratData)
panc8 <- LoadData("panc8")

panc_id <- subset(panc8, cells = colnames(panc8)[panc8$tech=="indrop"])
panc_id <- SCTransform(panc_id, verbose = F) %>%
  RunPCA(verbose = F) %>%
  RunUMAP(dims = 1:30, verbose = F)
DimPlot(panc_id, group.by = "celltype", label = T, repel = T) + NoLegend()

We have shown that the technical noise and sparsity characteristic of many scRNA-seq platforms, especially common droplet-based platforms like 10X, are propagated into the CCIM that is the basis of the interaction program discovery workflow. We have found that denoising algorithms, like ALRA, can dramatically improve this sparsity in a way that better recapitulates ground-truth communication states. We thus highly recommend using ALRA to denoise datasets captured with sparser platforms.

panc_id <- SeuratWrappers::RunALRA(panc_id)

Now we find interaction programs, score them for statistical significance, and then score all single cells in the dataset on the expression of these programs. Before we do this, a couple notes on this process.

SoftPower parameter selection

One important parameter in the interaction program discovery workflow is the soft thresholding power (softPower) used in generating the adjacency matrix for interaction program discovery. In traditional gene correlation network analyses, soft thresholding increases the degree of gene-gene connectivity required to form a module, thereby lowering the influence of spurious correlations within the similarity matrix. The authors of WGCNA have recommended utilizing the lowest softPower that results in a scale-free topology fitting index (R2) of greater than 0.8.

Because a single ligand can interact with multiple receptors, and vice versa, the variables of a CCIM that are used to calculate the similarity matrix for WGCNA are not independent. We hypothesized that using the same soft thresholding guidelines as recommended for WGCNA may result in highly connected interaction programs where only a single ligand or receptor is represented. We thus evaluated the impact of the R2 threshold on interaction program size, composition, and statistical significance.

We found that, at the standard R2 = 0.8 threshold, the mean recommended softPower was 3, and this decreased as the R2 threshold decreased (Supplementary Figure 11D). Higher R2 thresholds were associated with smaller program sizes, particularly when R2 > 0.5, and there was a moderate increase in the percentage of programs composed of only 1 ligand or receptor with increasing R2 thresholds. Additionally, we observed that decreasing the R2 threshold led to a moderate increase in the percentage of non-statistically significant programs, indicative of spurious correlations. These data indicate that an optimal R2 threshold to avoid both spurious programs as well as programs composed of only a single ligand or receptor may lie between 0.5 and 0.75.

The default R2 for interaction program discovery is now set at 0.6, which is used by the functions InteractionPrograms and FindAllInteractionPrograms.

Cell type proportional downsampling for iterative TOM generation

When working with large datasets where it is impractical to generate a CCIM for the entire dataset, Scriabin will iteratively approximate the ligand-receptor pair TOM by generating CCIM on downsampled sequences of the dataset. So that rare cell types that could be uniquely contributing to CCC are not lost in this iterative subsampling, we recommend setting the group.by parameter: this parameter corresponds to the meta.data column where cell type or clustering annotations are present. Downsampling will occur proportionally to the present cell types, ensuring that there is some representation of every cell type in the dataset.

#find interaction programs
panc_ip <- FindAllInteractionPrograms(panc_id, iterate.threshold = 300, group.by = "celltype", assay = "alra", sim_threshold = 0.4)

#test for interaction program significance
panc_ip_sig <- InteractionProgramSignificance(panc_ip, n.replicate = 500)

#keep IP that are significant in at least one cell type
#in this example, all programs are significant in at least one cell type
ip_pvals <- panc_ip_sig %>% as_tibble() %>%
  dplyr::select(name,ends_with("pval")) %>% unique() %>%
  pivot_longer(!name, names_to = "celltype", values_to = "pval") %>%
  group_by(name) %>% dplyr::mutate(min_p = min(pval)) %>%
  dplyr::select(name,min_p) %>% unique() %>% 
  dplyr::filter(min_p < 0.05) %>% pull(name)
panc_ip_sig %<>% dplyr::filter(name %in% ip_pvals)

#score cells by expression of interaction program
panc_id <- ScoreInteractionPrograms(panc_id, panc_ip_sig)

Let's visualize average expression of these interaction programs per cell type

panc_id_ip_lig <- as.matrix(panc_id[["IPligands"]]@data %>% t() %>%
  as.data.frame() %>% add_column(celltype = panc_id$celltype) %>%
  group_by(celltype) %>%
  summarise_if(is.numeric, mean) %>% column_to_rownames("celltype"))
Heatmap(panc_id_ip_lig, show_column_names = F, name = "Ligands")

panc_id_ip_rec <- as.matrix(panc_id[["IPreceptors"]]@data %>% t() %>%
  as.data.frame() %>% add_column(celltype = panc_id$celltype) %>%
  group_by(celltype) %>%
  summarise_if(is.numeric, mean) %>% column_to_rownames("celltype"))
Heatmap(panc_id_ip_rec, show_column_names = F, name = "Receptors")

We find several modules with shared expression patterns in stellate cells, but higher expression in activated vs. quiescent stellate cells. Who do they communicate with?

act_stellate_ip <- panc_id_ip_lig["activated_stellate",]
poi <- gsub("ligands_","",names(which(act_stellate_ip==max(act_stellate_ip))))

#Seurat's FeaturePlot has a nice option to blend expression of two features together on the same plot
IPFeaturePlot(panc_id, ip = poi)
DimPlot(panc_id, group.by = "celltype", label = T, repel = T) + NoLegend()

In this module we see highly specific ligand expression by activated stellate cells which send to endothelial cells. Let's take a look at the genes within this module.

moi <- reshape2::melt(panc_ip_sig %>% dplyr::filter(name==poi) %>%
  select("lr_pair",contains("connectivity"))) %>% arrange(-value)
moi$lr_pair <- factor(moi$lr_pair, levels = unique(moi$lr_pair))
ggplot(moi, aes(x = lr_pair, y = value, color = variable)) + 
  geom_point() + theme_cowplot() + ggpubr::rotate_x_text() + labs(x = NULL, y = "Intramodular\nconnectivity")

Identify high scoring modules paired across cell types by summing average ligand and receptor program score across all cell types

ip_by_celltype <- IPCellTypeSummary(panc_id, group.by = "celltype")
ip_by_celltype %>% group_by(sender) %>% top_n(n = 1, wt = additive.score)

To perform a comparative analysis, let's use the stimulated/unstimulated PBMC dataset available through SeuratData

ifnb <- UpdateSeuratObject(LoadData("ifnb"))

ifnb <- PercentageFeatureSet(ifnb, pattern = "MT-", col.name = "percent.mt") %>%
  SCTransform(vars.to.regress = "percent.mt", verbose = F) %>%
  RunPCA(verbose = F) %>% 
  FindNeighbors(dims = 1:30, verbose = F) %>%
  FindClusters(verbose = F) %>%
  RunUMAP(dims = 1:30, verbose = F)

DimPlot(ifnb, label = T, repel = T) + NoLegend()
DimPlot(ifnb, label = T, repel = T, group.by = "seurat_annotations") + NoLegend()
DimPlot(ifnb, group.by = "stim")

As above, we will performed ALRA-based denoising on this dataset

ifnb <- SeuratWrappers::RunALRA(ifnb)

ifnb_ip <- FindAllInteractionPrograms(ifnb, group.by = "stim", 
                                      cell_types = "seurat_annotations", assay = "alra")
ifnb_ip_sig <- InteractionProgramSignificance(ifnb_ip)
ifnb <- ScoreInteractionPrograms(ifnb, mods = ifnb_ip_sig)

What interaction programs have differential expression between the conditions?

stim_ip_ligands <- FindMarkers(ifnb, group.by = "stim", ident.1 = "STIM", assay = "IPligands", test.use = "wilcox_limma")
stim_ip_receptors <- FindMarkers(ifnb, group.by = "stim", ident.1 = "STIM", assay = "IPreceptors", test.use = "wilcox_limma")

#what's the program whose ligands are increasing the most in the STIM condition?
poi <- stim_ip_ligands %>% rownames_to_column("IP") %>% 
  top_n(n = 1, wt = avg_log2FC) %>% pull(IP)

features <- ifnb_ip_sig %>% dplyr::filter(name==poi) %>%
  separate(lr_pair, into = c("ligand","receptor"), sep = "=") %>% pull(ligand) %>% unique()

DotPlot(ifnb, features = features, group.by = "stim")
DotPlot(ifnb, features = features, group.by = "seurat_annotations")