options(width = 150)
knitr::opts_chunk$set(
  collapse = TRUE,
  tidy = FALSE,
  message = FALSE,
  warning = FALSE
)

Setup the Matchmaker Object

For this tutorial, we will be analyzing the human decidua single cell RNA-seq dataset from Vento-tormo et al. 2018 Nature. There are 64,734 single cells that were generated using 10X Genomics. The raw data can be found here. We will be using the CellPhoneDB ligands-receptors interaction repository.

A fraction of this data containing 947 ligand and receptor genes and 3000 cells (726 EVT, 953 dS2, 788 dNK1, and 533 dM1) is included in the scMatchmaker package called decidua and the corresponding metadata containing cell type annotations called decidua.annotation. This data has been normalized in logTPM (log-transformed transcripts per million), and it is recommended to start with normalized data. The Normalization function can be used to normalize raw scRNA-seq counts. It provides widely used logTPM normalization, as well as cosine normalization used in Haghverdi et al. 2018 Nature Biotechnology.

Then we load a CellPhoneDB database included in the package called cellphonedb. Alternatively, we can use LoadCellPhoneDB function to load the CellPhoneDB database either locally or from their website.

You can check the Matchmaker object structure and slot information by typing ?'Matchmaker-class'.

# Install scMatchmaker from CRAN.
if (!requireNamespace("scMatchmaker", quietly = TRUE))
    install.packages("scMatchmaker")

# Load scMatchmaker.
library(scMatchmaker)

# Load the CellPhoneDB interaction database from URLs. 
# This step is optional when using scMatchmaker's preloaded database.
interaction.url = "https://raw.githubusercontent.com/Teichlab/cellphonedb-data/master/data/interaction_input.csv"
gene.url = "https://raw.githubusercontent.com/Teichlab/cellphonedb-data/master/data/gene_input.csv"
complex.url = "https://raw.githubusercontent.com/Teichlab/cellphonedb-data/master/data/complex_input.csv" 

interaction.data <- LoadCellPhoneDB(
  interaction_input = interaction.url,
  gene_input = gene.url,
  complex_input = complex.url,
  gene.symbol = "gene_name", url = TRUE)

# Initialize Matchmaker object with scMatchmaker preloaded cellphonedb database.
# Missing ligands and receptors will be added with zeros, this step is useful when calculating protein interaction complex later.
decidua.interaction <- Screening(data = decidua, annotation = decidua.annotation[,"annotation", drop = FALSE], interaction = cellphonedb, project_name = "Decidua")
decidua.interaction

What does the data look like?

# Let's look at the some example ligand and receptor genes in the first twenty cells.
decidua[c("EGFR","TGFB1","EGF"), 1:20]

The dots . in the decidua dataset count matrix represent 0's (no UMI). Because scRNA-seq data contains lots of 0's, Matchmaker uses this sparse matrix notation to store data, interaction strength and p values matrices to save memory. However, if there are not many zero entries, sparse matrix is no longer efficient to save memory. Matchmaker controls for this by the zero_percent parameter. The default is 0.7, that is, when the non-zero entries is above 70% the data will be saved as sparse matrix,

# Let's look at the first six rows of the corresponding metadata.
head(x = decidua.annotation)
# Let's look at the first six rows of the CellPhoneDB interactions loaded from URLs.
head(x = interaction.data)
# Let's look at the first six rows of the preloaded CellPhoneDB interaction list.
head(x = cellphonedb)

\

Identify Cell-Cell Interactions with Matchmaking

Before we search for cell-cell interactions, we can optionally downsample the data with Sketching function to reduce the number of cells which may speed up the computation time.

# Downsampling the number of cells to 1000.
decidua.interaction <- Sketching(object = decidua.interaction, size = 1000, downsampling = TRUE)
decidua.interaction

Which cells are being "sketched"?

# Show the frist six downsampled cell IDs.
head(x = decidua.interaction@misc$sketch_id)

\

The Matchmaking function is the workhorse that calculates the interaction strengths and p values. It first calculates the interaction strengths using either a base model or an Earth Mover's Distance (EMD) adjusted model (see below). Then it randomly permutes the cell type identities for a number of times (usually 100-1000), and calculates a null distribution for each interacting pairs similar to the CellPhoneDB Python Package. This null distribution will serve as a nonparametric tests to select the significant interactions.

The rows represent cell-cell interacting pairs (dM1|DC1 stands for interaction between dM1 and DC1 cell types) and the columns represent ligand-receptor pairs (i.e. ACE2_GHRL stands for interaction between ACE2 and GHRL genes). Together it reads as dM1 cells express ACE2 molecules and interact with GHRL molecules on DC1 cells. Please note that ligand receptor interactions are direction sensitive, dM1|DC1 has opposite in meaning to DC1|dM1.

# Base model with 100 permutations.
ptm <- proc.time()
decidua.interaction <- Matchmaking(object = decidua.interaction, n_perm = 100)
proc.time() - ptm
decidua.interaction

We run the unweighted EMD model with 100 permutations.

# It will take longer time compared to the base model because of the additional EMD adjustment.
ptm <- proc.time()
decidua.interaction.emd.unweighted <- Matchmaking(object = decidua.interaction, emd = TRUE, n_perm = 100, weighted = FALSE)
proc.time() - ptm

We run the default weighted EMD model with 100 permutations.

# It will take longer time compared to the unweighted model.
ptm <- proc.time()
decidua.interaction.emd <- Matchmaking(object = decidua.interaction, emd = TRUE, n_perm = 100)
proc.time() - ptm
decidua.interaction.emd

What do the interaction strength and p values matrices look like?

# Show the first five rows and the first five columns of the strength matrix.
decidua.interaction.emd@strength[1:5,1:5]
# Show the first five rows and the first five columns of the p value matrix.
decidua.interaction.emd@pvalue[1:5,1:5]

\

If interaction complexes information is provided in the @interaction slot, for example, in the preloaded cellphonedb database, the subunits are listed with column names subunit_a_1, subunit_a_2, subunit_a_3 and subunit_b_1, subunit_b_2, subunit_b_3 respectively.

We can calculate the complex-complex interactions by using Complexing function. It calculates the complex interactions from its subunits with the following options:

# Please note that if Selecting function is called before Complexing, you will need to re-run the Selecting step.
decidua.interaction.emd <- Complexing(decidua.interaction.emd)

We can also merge directed (one-way) interactions into undirected (two-way) interactionsc using Merging function. For example, before merging, DC1|dM1 and dM1|DC1 represents two different cell-cell interactions: DC1 expresses ligands and dM1 expresses receptors, versus, dM1 expresses ligands and DC1 expresses receptors. After merging, they will be combined as one undirected cell-cell interaction between DC1 and dM1.

The Merging function takes following arguments similar to Complexing:

# Merge the directed one-way interactions into the undirected two-way interactions.
decidua.interaction.emd <- Merging(object = decidua.interaction.emd)

If we want to revert the Complexing or Merging operation, we can use the Resetting function by specifying the by argument with either complex (default) or merge to revert it.

# Reset the Merging operation. 
decidua.interaction.emd <- Resetting(object = decidua.interaction.emd, by = "merge")

# Reset the Complexing operation. 
decidua.interaction.emd <- Resetting(object = decidua.interaction.emd, by = "complex")

Select Top Interactions

After calculating the interaction strengths and p values. We use Selecting function to filter and select the top significant interactions.

# Select the top 10% interactions with p values less than 0.05 in the base model.
decidua.interaction <- Selecting(object = decidua.interaction, strength.pct = 0.1, pval.cutoff = 0.05)

# Select the top 10% interactions with p values less than 0.05 in the EMD model.
decidua.interaction.emd <- Selecting(object = decidua.interaction.emd, strength.pct = 0.1, pval.cutoff = 0.05)

The Converting function converts the strength and p value matrices into a long ranked list of interaction candidates. By setting selected = TRUE, it will convert the selected data from the Selecting step.

# Convert the selected interactions in base model.
converted.data <- Converting(object = decidua.interaction, selected = TRUE)

# Convert the selected interactions in EMD model.
converted.data.emd <- Converting(object = decidua.interaction.emd, selected = TRUE)

We can subset the cell-cell interactions or ligand-receptor pairs using Subseting function.

# Subset interactions between EVT and dM1.
dc1.m1.interactions <- Subsetting(object = decidua.interaction.emd, ident1 = "EVT", ident2 = "dM1")

# Subset cell-cell interactions involving PGF-FLT1.
pgf.flt1.interactions <- Subsetting(object = decidua.interaction.emd, partner_a = "FLT1", partner_b = "PGF")

Finally, we can save the significant (selected) interaction strengths and p-values into csv files using Saving function.

# Save the results.
Saving(decidua.interaction.emd, file_name = "decidua_emd", selected = TRUE)

We can also save the analyzed Matchmaker R object.

# Save the Matchmaker object.
saveRDS(object = decidua.interaction.emd, file = "decidua.interaction.emd.rds")

What are the top selected interactions?

The top ten interactions in the base model:

# Show the top ten interactions in the base model.
head(x = converted.data, n = 10)

The top ten interactions in the EMD model:

# Show the top ten interactions in the EMD model.
head(x = converted.data.emd, n = 10)

\

Visualize Cell-Cell Interactions

scMatchmaker provides three different ways to visualize the interactions:
PlotSpot function generates a spot plot either with bars or dots.
PlotHeatmap function generates a heatmap plot. PlotNetwork function generates a network plot. PlotScatter function generates a scatter plot for a specific cell-cell interaction pair.
PlotHistogram function generates a histogram for a specific ligand and receptor pair.

We use PlotSpot to visualize the first twenty rows and the first ten columns of the interaction @strength and @pvalues.

# Spot plot with bars 
plot.bar <- PlotSpot(object = decidua.interaction.emd, which.cell = 1:10, which.gene = 1:10, type = "bar", order = TRUE)

# Spot plot with dots
plot.dot <- PlotSpot(object = decidua.interaction.emd, which.cell = 1:10, which.gene = 1:10, type = "dot", order = TRUE)

cowplot::plot_grid(plot.bar, plot.dot)

We use PlotHeatmap to visualize the most interactive cell types by setting aggregate = "cell".

PlotHeatmap(object = decidua.interaction.emd, aggregate = "cell", order = TRUE, selected = TRUE, mar = c(10,10))

We use PlotNetwork to visualize the cell-cell interaction network by setting aggregate = "cell".

PlotNetwork(object = decidua.interaction.emd, aggregate = "cell", selected = TRUE, legend.posit = "topleft", legend.breaks = 3)

We can also use PlotNetwork to visualize the gene-gene interaction network by setting aggregate = "gene".

PlotNetwork(object = decidua.interaction.emd, aggregate = "gene", louvain = TRUE, selected = TRUE, filter = TRUE, low.cutoff = 5, node.size = 4, legend.posit = "bottomright")
# Visualize ITGB1-PLAUR interactions between EVT and dM1 cells.
PlotScatter(object = decidua.interaction.emd, ident1 = "dM1", ident2 = "EVT", ligands = "ITGB1", receptors = "PLAUR", add.lines = TRUE, point.cex = 2)

We use PlotHistogram to visualize the top interacting ligand-receptor pairs in the base model: CD74_MIF and CD74_APP in dM1|EVT and dM1|dS2 cell-cell interactions respectively. We can see that the base model prioritizes higher mean expression values of ligand and receptor pairs, regardless of their variable distributions.

par(mfrow = c(1,2))

# Histogram for CD74 and MIF interaction between dM1 and EVT cells.
PlotHistogram(object = decidua.interaction, ligand = "MIF", receptor = "CD74", ligand.ident = "EVT", receptor.ident = "dM1", nbins = 25)

# Histogram for CD74 and APP interaction between dM1 and dS2 cells.
PlotHistogram(object = decidua.interaction, ligand = "APP", receptor = "CD74", ligand.ident = "dS2", receptor.ident = "dM1", nbins = 25)

We then use PlotHistogram to visualize the top interacting ligand-receptor pairs in the EMD model: FLT1_PGF and KLRC1_HLA-E in EVT|EVT and dNK1|dM1 cell-cell interactions respectively. The results show that the EMD model takes into account the variability in the distributions between ligand and receptor pair. It is not biased towards higher mean expression values in either ligand or receptor. It selects the best "matching" pairs.

par(mfrow = c(1,2))

# Histogram for KLRC1 and HLA-E interaction between dNK1 and dM1 cells.
PlotHistogram(object = decidua.interaction.emd, ligand = "KLRC1", receptor = "HLA-E", ligand.ident = "dNK1", receptor.ident = "dM1", nbins = 25)

# Histogram for FLT1 and PGF interaction between EVT and EVT cells.
PlotHistogram(object = decidua.interaction.emd, ligand = "PGF", receptor = "FLT1", ligand.ident = "EVT", receptor.ident = "EVT", nbins = 25)

Lastly, we compare the base model (x axis) against the EMD model (y axis). We see that the baseline model creates quite a few false-positives (dark red dots in lower right).

plot(x = decidua.interaction@strength, 
     y = decidua.interaction.emd@strength, 
     xlab = "Base Model", ylab = "EMD Adjusted", 
     col = RColorBrewer::brewer.pal(5,"Reds")[cut(decidua.interaction@strength-decidua.interaction.emd@strength,5)],
     pch = 16)
abline(-0.2, 1, col = "red", lwd = 2, lty = 2)
abline(0.1, 1, col = "red", lwd = 2,lty = 2)

# R session information.
sessionInfo()


stevexniu/scMatchmaker documentation built on June 2, 2022, 12:35 p.m.