In stevexniu/scMatchmaker: Single Cell Matchmaker

options(width = 150)
knitr::opts_chunk$set(
  collapse = TRUE,
  tidy = FALSE,
  message = FALSE,
  warning = FALSE
)

Setup the Matchmaker Object

For this tutorial, we will be analyzing the human decidua single cell RNA-seq dataset from Vento-tormo et al. 2018 Nature. There are 64,734 single cells that were generated using 10X Genomics. The raw data can be found here. We will be using the CellPhoneDB ligands-receptors interaction repository.

A fraction of this data containing 947 ligand and receptor genes and 3000 cells (726 EVT, 953 dS2, 788 dNK1, and 533 dM1) is included in the scMatchmaker package called decidua and the corresponding metadata containing cell type annotations called decidua.annotation. This data has been normalized in logTPM (log-transformed transcripts per million), and it is recommended to start with normalized data. The Normalization function can be used to normalize raw scRNA-seq counts. It provides widely used logTPM normalization, as well as cosine normalization used in Haghverdi et al. 2018 Nature Biotechnology.

Then we load a CellPhoneDB database included in the package called cellphonedb. Alternatively, we can use LoadCellPhoneDB function to load the CellPhoneDB database either locally or from their website.

You can check the Matchmaker object structure and slot information by typing ?'Matchmaker-class'.

# Install scMatchmaker from CRAN.
if (!requireNamespace("scMatchmaker", quietly = TRUE))
    install.packages("scMatchmaker")

# Load scMatchmaker.
library(scMatchmaker)

# Load the CellPhoneDB interaction database from URLs. 
# This step is optional when using scMatchmaker's preloaded database.
interaction.url = "https://raw.githubusercontent.com/Teichlab/cellphonedb-data/master/data/interaction_input.csv"
gene.url = "https://raw.githubusercontent.com/Teichlab/cellphonedb-data/master/data/gene_input.csv"
complex.url = "https://raw.githubusercontent.com/Teichlab/cellphonedb-data/master/data/complex_input.csv" 

interaction.data <- LoadCellPhoneDB(
  interaction_input = interaction.url,
  gene_input = gene.url,
  complex_input = complex.url,
  gene.symbol = "gene_name", url = TRUE)

# Initialize Matchmaker object with scMatchmaker preloaded cellphonedb database.
# Missing ligands and receptors will be added with zeros, this step is useful when calculating protein interaction complex later.
decidua.interaction <- Screening(data = decidua, annotation = decidua.annotation[,"annotation", drop = FALSE], interaction = cellphonedb, project_name = "Decidua")
decidua.interaction

What does the data look like?

# Let's look at the some example ligand and receptor genes in the first twenty cells.
decidua[c("EGFR","TGFB1","EGF"), 1:20]

The dots . in the decidua dataset count matrix represent 0's (no UMI). Because scRNA-seq data contains lots of 0's, Matchmaker uses this sparse matrix notation to store data, interaction strength and p values matrices to save memory. However, if there are not many zero entries, sparse matrix is no longer efficient to save memory. Matchmaker controls for this by the zero_percent parameter. The default is 0.7, that is, when the non-zero entries is above 70% the data will be saved as sparse matrix,

# Let's look at the first six rows of the corresponding metadata.
head(x = decidua.annotation)

# Let's look at the first six rows of the CellPhoneDB interactions loaded from URLs.
head(x = interaction.data)

# Let's look at the first six rows of the preloaded CellPhoneDB interaction list.
head(x = cellphonedb)

Identify Cell-Cell Interactions with Matchmaking

Before we search for cell-cell interactions, we can optionally downsample the data with Sketching function to reduce the number of cells which may speed up the computation time.

There are two methods to perform downsampling:
- Downsample proportionally to the cell type frequencies (downsampling = TRUE). The actual downsampling size may vary slightly due to rounding effects.
- Geometric Sketching by Hie et al. 2019 Cell Systems using the input data (downsampling = FALSE).

# Downsampling the number of cells to 1000.
decidua.interaction <- Sketching(object = decidua.interaction, size = 1000, downsampling = TRUE)
decidua.interaction

Which cells are being "sketched"?

# Show the frist six downsampled cell IDs.
head(x = decidua.interaction@misc$sketch_id)

The Matchmaking function is the workhorse that calculates the interaction strengths and p values. It first calculates the interaction strengths using either a base model or an Earth Mover's Distance (EMD) adjusted model (see below). Then it randomly permutes the cell type identities for a number of times (usually 100-1000), and calculates a null distribution for each interacting pairs similar to the CellPhoneDB Python Package. This null distribution will serve as a nonparametric tests to select the significant interactions.

Matchmaking will return two interaction matrices in Matchmaker object:
- @strength stores the relative interaction strength.
- @pvalue stores the p values for each interactions.

The rows represent cell-cell interacting pairs (dM1|DC1 stands for interaction between dM1 and DC1 cell types) and the columns represent ligand-receptor pairs (i.e. ACE2_GHRL stands for interaction between ACE2 and GHRL genes). Together it reads as dM1 cells express ACE2 molecules and interact with GHRL molecules on DC1 cells. Please note that ligand receptor interactions are direction sensitive, dM1|DC1 has opposite in meaning to DC1|dM1.

Matchmaking has two methods to calculate interaction strength:
- Default base model (emd = FALSE). Interaction strengths are calculated as average mean expressions between interacting ligand and receptor pairs.
- Earth Mover's Distance (EMD) adjusted (emd = TRUE). Interaction strengths are first calculated as average mean expressions the same as the base model, and then adjusted by EMD which takes into account the similarity between ligand and receptor distributions. If the two distributions are similar to each other, the EMD will be small, vice versa.
- The EMD model has two modes: weighted (weighted = TRUE) or unweighted (weighted = FALSE). The weighted model will assign inversely proportional to the expression level, thus, higher. The unweighted model on the other hand will assign equal weights to all the expression levels.

# Base model with 100 permutations.
ptm <- proc.time()
decidua.interaction <- Matchmaking(object = decidua.interaction, n_perm = 100)
proc.time() - ptm

decidua.interaction

We run the unweighted EMD model with 100 permutations.

# It will take longer time compared to the base model because of the additional EMD adjustment.
ptm <- proc.time()
decidua.interaction.emd.unweighted <- Matchmaking(object = decidua.interaction, emd = TRUE, n_perm = 100, weighted = FALSE)
proc.time() - ptm

We run the default weighted EMD model with 100 permutations.

# It will take longer time compared to the unweighted model.
ptm <- proc.time()
decidua.interaction.emd <- Matchmaking(object = decidua.interaction, emd = TRUE, n_perm = 100)
proc.time() - ptm

decidua.interaction.emd

Key arguments passed to Matchmaking function include:
- n_perm Number of random permutations. Default is 100.
- nbins Number of bins to use for Earth Mover's Distance calculation.
- p.adjust.method p value adjustment method. Default is "BH" for Benjamini & Hochberg method.
- emd Whether to run EMD model. Default is FALSE
- weighted Whether to run weighted EMD model. Default is TRUE.
- Additional arguments can be found by typing ?Matchmaking.

What do the interaction strength and p values matrices look like?

# Show the first five rows and the first five columns of the strength matrix.
decidua.interaction.emd@strength[1:5,1:5]

# Show the first five rows and the first five columns of the p value matrix.
decidua.interaction.emd@pvalue[1:5,1:5]

If interaction complexes information is provided in the @interaction slot, for example, in the preloaded cellphonedb database, the subunits are listed with column names subunit_a_1, subunit_a_2, subunit_a_3 and subunit_b_1, subunit_b_2, subunit_b_3 respectively.

We can calculate the complex-complex interactions by using Complexing function. It calculates the complex interactions from its subunits with the following options:

strength_comb Method to combine subunits strengths.
- min, minimum strength of its subunits (Default).
- average, average strength of its subunits.
- max, maximum strength of its subunits.
pval_comb Method to combine subunits p values.
- max, maximum p value of its subunits (Default).
- average, average p value of its subunits.
- min, minimum p value of its subunits.

# Please note that if Selecting function is called before Complexing, you will need to re-run the Selecting step.
decidua.interaction.emd <- Complexing(decidua.interaction.emd)

We can also merge directed (one-way) interactions into undirected (two-way) interactionsc using Merging function. For example, before merging, DC1|dM1 and dM1|DC1 represents two different cell-cell interactions: DC1 expresses ligands and dM1 expresses receptors, versus, dM1 expresses ligands and DC1 expresses receptors. After merging, they will be combined as one undirected cell-cell interaction between DC1 and dM1.

The Merging function takes following arguments similar to Complexing:

strength_merge Method to merge directed interaction strengths.
- max, maximum strength of directed interactions (Default).
- average, average strength of directed interactions.
- min, minimum strength of directed interaction.
pval_merge Method to directed interactions p values.
- min, minimum p value of directed interactions (Default).
- average, average p value of directed interactions.
- max, maximum p value of directed interactions.

# Merge the directed one-way interactions into the undirected two-way interactions.
decidua.interaction.emd <- Merging(object = decidua.interaction.emd)

If we want to revert the Complexing or Merging operation, we can use the Resetting function by specifying the by argument with either complex (default) or merge to revert it.

# Reset the Merging operation. 
decidua.interaction.emd <- Resetting(object = decidua.interaction.emd, by = "merge")

# Reset the Complexing operation. 
decidua.interaction.emd <- Resetting(object = decidua.interaction.emd, by = "complex")

Select Top Interactions

After calculating the interaction strengths and p values. We use Selecting function to filter and select the top significant interactions.

Aguments passed to Selecting include:
- The strength.pct argument defines the quantile of interaction strength to be selected. The default is 0.1 (top 10% will be chosen).
- The pval.cutoff argument defines the p value cutoff, The default is 0.05 (interaction with p value less than 0.05 will be selected).

# Select the top 10% interactions with p values less than 0.05 in the base model.
decidua.interaction <- Selecting(object = decidua.interaction, strength.pct = 0.1, pval.cutoff = 0.05)

# Select the top 10% interactions with p values less than 0.05 in the EMD model.
decidua.interaction.emd <- Selecting(object = decidua.interaction.emd, strength.pct = 0.1, pval.cutoff = 0.05)

The Converting function converts the strength and p value matrices into a long ranked list of interaction candidates. By setting selected = TRUE, it will convert the selected data from the Selecting step.

# Convert the selected interactions in base model.
converted.data <- Converting(object = decidua.interaction, selected = TRUE)

# Convert the selected interactions in EMD model.
converted.data.emd <- Converting(object = decidua.interaction.emd, selected = TRUE)

We can subset the cell-cell interactions or ligand-receptor pairs using Subseting function.

# Subset interactions between EVT and dM1.
dc1.m1.interactions <- Subsetting(object = decidua.interaction.emd, ident1 = "EVT", ident2 = "dM1")

# Subset cell-cell interactions involving PGF-FLT1.
pgf.flt1.interactions <- Subsetting(object = decidua.interaction.emd, partner_a = "FLT1", partner_b = "PGF")

Finally, we can save the significant (selected) interaction strengths and p-values into csv files using Saving function.

# Save the results.
Saving(decidua.interaction.emd, file_name = "decidua_emd", selected = TRUE)

We can also save the analyzed Matchmaker R object.

# Save the Matchmaker object.
saveRDS(object = decidua.interaction.emd, file = "decidua.interaction.emd.rds")

What are the top selected interactions?

The top ten interactions in the base model:

# Show the top ten interactions in the base model.
head(x = converted.data, n = 10)

The top ten interactions in the EMD model:

# Show the top ten interactions in the EMD model.
head(x = converted.data.emd, n = 10)

Visualize Cell-Cell Interactions

scMatchmaker provides three different ways to visualize the interactions:
PlotSpot function generates a spot plot either with bars or dots.
PlotHeatmap function generates a heatmap plot. PlotNetwork function generates a network plot. PlotScatter function generates a scatter plot for a specific cell-cell interaction pair.
PlotHistogram function generates a histogram for a specific ligand and receptor pair.

PlotSpot generates a spot plot for cell-cell interactions. The size or height of the dot or bar represents the interaction strength. The color of the dot or bar represents the p values. Some useful arguments are listed below:
- which.cell and which.gene cell-cell or gene-gene interaction name or index to plot.
- type option "dot" for dot plot and option "bar" for bar plot.
- order order the data.
- low.cutoff and high.cutoff Cutoff for values to show.
- plot.y.axis, fix.y.axis etc. See ?PlotSpot for details.

We use PlotSpot to visualize the first twenty rows and the first ten columns of the interaction @strength and @pvalues.

# Spot plot with bars 
plot.bar <- PlotSpot(object = decidua.interaction.emd, which.cell = 1:10, which.gene = 1:10, type = "bar", order = TRUE)

# Spot plot with dots
plot.dot <- PlotSpot(object = decidua.interaction.emd, which.cell = 1:10, which.gene = 1:10, type = "dot", order = TRUE)

cowplot::plot_grid(plot.bar, plot.dot)

PlotHeatmap generates a heatmap. It takes arguments similar to PlotSpot. In addition, it can provide a summarized view by aggregating interaction data:
- selected use @selected data if calculated.
- aggregate type of aggregation to use. Options include cell and gene.
- counted aggregate interaction counts instead of strengths.
- scale scale and center the data to plot.
- Additional arguments see ?PlotHeatmap for details.

We use PlotHeatmap to visualize the most interactive cell types by setting aggregate = "cell".

PlotHeatmap(object = decidua.interaction.emd, aggregate = "cell", order = TRUE, selected = TRUE, mar = c(10,10))

PlotNetwork generates a gene-gene or cell-cell interaction network plot. It takes arguments similar to PlotHeatmap:
- selected use @selected data if calculated.
- aggregate type of aggregation to use. Options include cell and gene.
- node.size, node.color and edge.color size and color of the nodes and edges.
- layout layout used to plot network (i.e tsne or umap layout).
- directed plots directed network.
- louvain performs louvain clustering.
- Additional arguments see ?PlotNetwork for details.

We use PlotNetwork to visualize the cell-cell interaction network by setting aggregate = "cell".

PlotNetwork(object = decidua.interaction.emd, aggregate = "cell", selected = TRUE, legend.posit = "topleft", legend.breaks = 3)

We can also use PlotNetwork to visualize the gene-gene interaction network by setting aggregate = "gene".

PlotNetwork(object = decidua.interaction.emd, aggregate = "gene", louvain = TRUE, selected = TRUE, filter = TRUE, low.cutoff = 5, node.size = 4, legend.posit = "bottomright")

The PlotScatter function generates visualization of expression profiles for specific gene-gene interactions among all other interactions in a given pair of cell-cell interaction.

# Visualize ITGB1-PLAUR interactions between EVT and dM1 cells.
PlotScatter(object = decidua.interaction.emd, ident1 = "dM1", ident2 = "EVT", ligands = "ITGB1", receptors = "PLAUR", add.lines = TRUE, point.cex = 2)

PlotHistogram generates a histogram to visualize the ligand and receptor expressions in interacting cell types. Possible arguments include:
- ligand and receptor ligand and receptor gene names.
- ligand.ident and receptor.ident Cell types to plot for ligand and receptor.
- nbins Number of bins to use for histogram.
- Additional arguments see ?PlotHistogram for details.

We use PlotHistogram to visualize the top interacting ligand-receptor pairs in the base model: CD74_MIF and CD74_APP in dM1|EVT and dM1|dS2 cell-cell interactions respectively. We can see that the base model prioritizes higher mean expression values of ligand and receptor pairs, regardless of their variable distributions.

par(mfrow = c(1,2))

# Histogram for CD74 and MIF interaction between dM1 and EVT cells.
PlotHistogram(object = decidua.interaction, ligand = "MIF", receptor = "CD74", ligand.ident = "EVT", receptor.ident = "dM1", nbins = 25)

# Histogram for CD74 and APP interaction between dM1 and dS2 cells.
PlotHistogram(object = decidua.interaction, ligand = "APP", receptor = "CD74", ligand.ident = "dS2", receptor.ident = "dM1", nbins = 25)

We then use PlotHistogram to visualize the top interacting ligand-receptor pairs in the EMD model: FLT1_PGF and KLRC1_HLA-E in EVT|EVT and dNK1|dM1 cell-cell interactions respectively. The results show that the EMD model takes into account the variability in the distributions between ligand and receptor pair. It is not biased towards higher mean expression values in either ligand or receptor. It selects the best "matching" pairs.

par(mfrow = c(1,2))

# Histogram for KLRC1 and HLA-E interaction between dNK1 and dM1 cells.
PlotHistogram(object = decidua.interaction.emd, ligand = "KLRC1", receptor = "HLA-E", ligand.ident = "dNK1", receptor.ident = "dM1", nbins = 25)

# Histogram for FLT1 and PGF interaction between EVT and EVT cells.
PlotHistogram(object = decidua.interaction.emd, ligand = "PGF", receptor = "FLT1", ligand.ident = "EVT", receptor.ident = "EVT", nbins = 25)

Lastly, we compare the base model (x axis) against the EMD model (y axis). We see that the baseline model creates quite a few false-positives (dark red dots in lower right).

plot(x = decidua.interaction@strength, 
     y = decidua.interaction.emd@strength, 
     xlab = "Base Model", ylab = "EMD Adjusted", 
     col = RColorBrewer::brewer.pal(5,"Reds")[cut(decidua.interaction@strength-decidua.interaction.emd@strength,5)],
     pch = 16)
abline(-0.2, 1, col = "red", lwd = 2, lty = 2)
abline(0.1, 1, col = "red", lwd = 2,lty = 2)