knitr::opts_chunk$set( collapse = TRUE, # comment = "#>", warning = FALSE, message = FALSE ) library(BiocStyle)
In this vignette, you can learn how to perform a sample-agnostic/cell-level MultiNicheNet analysis on data with complex multifactorial experimental designs. More specifically, we will cover in depth how to study differential dynamics of cell-cell communication between conditions. We will assume the following type of design that is quite common: we have 2 or more conditions/groups of interest, and for each condition we also have 2 or more time points (eg before and after a certain treatment). Research questions are: how does cell-cell communication change over time in each condition? And, importantly, how are these time-related changes different between the conditions? Certainly the latter is a non-trivial question.
Compared to the vignette multifactorial_analysis_BreastCancer.knit.md, this vignette demonstrates how to perform such a multifactorial analysis on data where you have less than 3 samples in each of the groups/conditions. In this workflow, cells from the same condition will be pooled together similar to regular differential cell-cell communication workflows. We only recommend running this pipeline if you have less than 3 samples in each of the groups/conditions you want to compare. Do not run this workflow if you have more samples per condition.
This vignette is quite advanced, so if you are new to the sample-agnostic/cell-level MultiNicheNet, we recommend reading and running this vignette: basis_analysis_steps_MISC_SACL.knit.md to get acquainted with the methodology and simple applications.
As example expression data of interacting cells for this vignette, we will here use scRNAseq data from breast cancer biopsies of patients receiving anti-PD1 immune-checkpoint blockade therapy. Bassez et al. collected from each patient one tumor biopsy before anti-PD1 therapy (“pre-treatment”) and one during subsequent surgery (“on-treatment”) A single-cell map of intratumoral changes during anti-PD1 treatment of patients with breast cancer. Based on additional scTCR-seq results, they identified one group of patients with clonotype expansion as response to the therapy (“E”) and one group with only limited or no clonotype expansion (“NE”). To make this vignette more easily adaptable to the user, we will rename the groups and time points as follows: E --> group1, NE --> group2, Pre-treatment: timepoint1, On-treatment: timepoint2.
Now, how can we define which CCC patterns are changing during anti-PD1 therapy and which of these changes are specific to E compared to NE patients (and vice versa)? First, we will run a MultiNicheNet, focusing on anti-PD1 therapy related differences in group1. Secondly, we will do the same for group2. Then we will compare the prioritized interactions between both groups in a basic way. Finally, we will run a MultiNicheNet analysis with a complex contrast to focus specifically on group-specific differences in anti-PD1 therapy associated CCC changes.
In this vignette, we will first prepare the common elements of all MultiNicheNet core analyses, then run, interpret and compare the different analyses.
library(SingleCellExperiment) library(dplyr) library(ggplot2) library(nichenetr) library(multinichenetr)
MultiNicheNet builds upon the NicheNet framework and uses the same prior knowledge networks (ligand-receptor network and ligand-target matrix, currently v2 version).
The Nichenet v2 networks and matrices for both mouse and human can be downloaded from Zenodo .
We will read these object in for human because our expression data is of human patients.
Gene names are here made syntactically valid via make.names()
to avoid the loss of genes (eg H2-M3) in downstream visualizations.
organism = "human"
options(timeout = 250) if(organism == "human"){ lr_network_all = readRDS(url( "https://zenodo.org/record/10229222/files/lr_network_human_allInfo_30112033.rds" )) %>% mutate( ligand = convert_alias_to_symbols(ligand, organism = organism), receptor = convert_alias_to_symbols(receptor, organism = organism)) lr_network_all = lr_network_all %>% mutate(ligand = make.names(ligand), receptor = make.names(receptor)) lr_network = lr_network_all %>% distinct(ligand, receptor) ligand_target_matrix = readRDS(url( "https://zenodo.org/record/7074291/files/ligand_target_matrix_nsga2r_final.rds" )) colnames(ligand_target_matrix) = colnames(ligand_target_matrix) %>% convert_alias_to_symbols(organism = organism) %>% make.names() rownames(ligand_target_matrix) = rownames(ligand_target_matrix) %>% convert_alias_to_symbols(organism = organism) %>% make.names() lr_network = lr_network %>% filter(ligand %in% colnames(ligand_target_matrix)) ligand_target_matrix = ligand_target_matrix[, lr_network$ligand %>% unique()] } else if(organism == "mouse"){ lr_network_all = readRDS(url( "https://zenodo.org/record/10229222/files/lr_network_mouse_allInfo_30112033.rds" )) %>% mutate( ligand = convert_alias_to_symbols(ligand, organism = organism), receptor = convert_alias_to_symbols(receptor, organism = organism)) lr_network_all = lr_network_all %>% mutate(ligand = make.names(ligand), receptor = make.names(receptor)) lr_network = lr_network_all %>% distinct(ligand, receptor) ligand_target_matrix = readRDS(url( "https://zenodo.org/record/7074291/files/ligand_target_matrix_nsga2r_final_mouse.rds" )) colnames(ligand_target_matrix) = colnames(ligand_target_matrix) %>% convert_alias_to_symbols(organism = organism) %>% make.names() rownames(ligand_target_matrix) = rownames(ligand_target_matrix) %>% convert_alias_to_symbols(organism = organism) %>% make.names() lr_network = lr_network %>% filter(ligand %in% colnames(ligand_target_matrix)) ligand_target_matrix = ligand_target_matrix[, lr_network$ligand %>% unique()] }
In this vignette, sender and receiver cell types are in the same SingleCellExperiment object, which we will load here. In this vignette, we will load in a subset of the breast cancer scRNAseq data . For the sake of demonstration, this subset only contains 3 cell types.
If you start from a Seurat object, you can convert it easily to a SingleCellExperiment object via sce = Seurat::as.SingleCellExperiment(seurat_obj, assay = "RNA")
.
Because the NicheNet 2.0. networks are in the most recent version of the official gene symbols, we will make sure that the gene symbols used in the expression data are also updated (= converted from their "aliases" to official gene symbols). Afterwards, we will make them again syntactically valid.
sce = readRDS(url( "https://zenodo.org/record/8010790/files/sce_subset_breastcancer.rds" )) sce = alias_to_symbol_SCE(sce, "human") %>% makenames_SCE()
Make sure your dataset also contains normalized counts. If this is not the case, you can calculate this as shown in the next code chunk:
sce = scuttle::logNormCounts(sce)
In this step, we will formalize our research question into MultiNicheNet input arguments.
In this case study, we want to study differences in therapy-induced cell-cell communication changes (On-vs-Pre therapy) between two patient groups (E vs NE: patients with clonotype expansion versus patients without clonotype expansion). Both therapy-timepoint and patient group are indicated in the following meta data column: expansion_timepoint
, which has 4 different values: PreE, PreNE, OnE, OnNE.
Cell type annotations are indicated in the subType
column, and the sample is indicated by the sample_id
column.
group_id = "expansion_timepoint" # combination timepoint/KO-vs-WT sample_id = "sample_id" celltype_id = "subType"
In the sammple-agnostic / cell-level worklow, it is not possible to correct for batch effects or covariates. Therefore, you here have to use the following NA settings:
covariates = NA batches = NA
Important: It is required that each sample-id is uniquely assigned to only one condition/group of interest. Therefore, our sample_id
here does not only indicate the patient, but also the timepoint of sampling.
Important: The column names of group, sample, and cell type should be syntactically valid (make.names
)
Important: All group, sample, and cell type names should be syntactically valid as well (make.names
) (eg through SummarizedExperiment::colData(sce)$ShortID = SummarizedExperiment::colData(sce)$ShortID %>% make.names()
)
If you want to focus the analysis on specific cell types (e.g. because you know which cell types reside in the same microenvironments based on spatial data), you can define this here. If you have sufficient computational resources and no specific idea of cell-type colocalzations, we recommend to consider all cell types as potential senders and receivers. Later on during analysis of the output it is still possible to zoom in on the cell types that interest you most, but your analysis is not biased to them.
Here we will consider all cell types in the data:
senders_oi = SummarizedExperiment::colData(sce)[,celltype_id] %>% unique() receivers_oi = SummarizedExperiment::colData(sce)[,celltype_id] %>% unique() sce = sce[, SummarizedExperiment::colData(sce)[,celltype_id] %in% c(senders_oi, receivers_oi) ]
In case you would have samples in your data that do not belong to one of the groups/conditions of interest, we recommend removing them and only keeping conditions of interest.
Before we do this, we will here rename our group/timepoint combinations. This is not necessary for you to do this, but then you will need to replace "G1.T1" with the equivalent of group1-timepoint1 in your data. Etc.
SummarizedExperiment::colData(sce)[,group_id][SummarizedExperiment::colData(sce)[,group_id] == "PreE"] = "G1.T1" SummarizedExperiment::colData(sce)[,group_id][SummarizedExperiment::colData(sce)[,group_id] == "PreNE"] = "G2.T1" SummarizedExperiment::colData(sce)[,group_id][SummarizedExperiment::colData(sce)[,group_id] == "OnE"] = "G1.T2" SummarizedExperiment::colData(sce)[,group_id][SummarizedExperiment::colData(sce)[,group_id] == "OnNE"] = "G2.T2"
conditions_keep = c("G1.T1", "G2.T1", "G1.T2", "G2.T2") sce = sce[, SummarizedExperiment::colData(sce)[,group_id] %in% conditions_keep ]
In MultiNicheNet, following steps are executed:
For each of these steps, some parameters need to be defined. This is what we will do now.
Parameters for step 1: Cell-type filtering:
In a MultiNicheNet analysis we will set a required minimum number of cells per cell type per condition. By default, we set this at 25 here for the sample-agnostic analyses. Conditions that have less than min_cells
cells will be excluded from the analysis for that specific cell type.
min_cells = 25
Parameters for step 2: Gene filtering
For each cell type, we will consider genes expressed if they are expressed in at least one condition. To do this, we need to set min_sample_prop = 1
.
min_sample_prop = 1
But how do we define which genes are expressed in a condition? For this we will consider genes as expressed if they have non-zero expression values in a fraction_cutoff
fraction of cells of that cell type in that condition By default, we set fraction_cutoff = 0.05
, which means that genes should show non-zero expression values in at least 5% of cells in a condition
fraction_cutoff = 0.05
We recommend using these default values unless there is specific interest in prioritizing (very) weakly expressed interactions. In that case, you could lower the value of fraction_cutoff
. We explicitly recommend against using fraction_cutoff > 0.10
.
Parameters for step 5: Ligand activity prediction
One of the prioritization criteria is the predicted activity of ligands in receiver cell types. Similarly to base NicheNet (https://github.com/saeyslab/nichenetr), we use the DE output to create a "geneset of interest": here we assume that DE genes within a cell type may be DE because of differential cell-cell communication processes. To determine the genesets of interest based on DE output, we need to define which logFC and/or p-value thresholds we will use.
We recommend following values for the sample-agnostic FindMarkers-way of DE analysis:
logFC_threshold = 0.25 # lower here for FindMarkers than for Pseudobulk-EdgeR p_val_threshold = 0.05
p_val_adj = TRUE
After the ligand activity prediction, we will also infer the predicted target genes of these ligands in each contrast. For this ligand-target inference procedure, we also need to select which top n of the predicted target genes will be considered (here: top 250 targets per ligand). This parameter will not affect the ligand activity predictions. It will only affect ligand-target visualizations and construction of the intercellular regulatory network during the downstream analysis. We recommend users to test other settings in case they would be interested in exploring fewer, but more confident target genes, or vice versa.
top_n_target = 250
The NicheNet ligand activity analysis can be run in parallel for each receiver cell type, by changing the number of cores as defined here. Using more cores will speed up the analysis at the cost of needing more memory. This is only recommended if you have many receiver cell types of interest. You can define here the maximum number of cores you want to be used.
n.cores = 8
Parameters for step 6: Prioritization
We will use the following criteria to prioritize ligand-receptor interactions:
We will combine these prioritization criteria in a single aggregated prioritization score. For the analysis on this type of data (with no or limited samples per condition), we only recommend using scenario = "no_frac_LR_expr"
.
scenario = "no_frac_LR_expr"
Finally, we still need to make one choice. For NicheNet ligand activity we can choose to prioritize ligands that only induce upregulation of target genes (ligand_activity_down = FALSE
) or can lead potentially lead to both up- and downregulation (ligand_activity_down = TRUE
). The benefit of ligand_activity_down = FALSE
is ease of interpretability: prioritized ligand-receptor pairs will be upregulated in the condition of interest, just like their target genes. ligand_activity_down = TRUE
can be harder to interpret because target genes of some interactions may be upregulated in the other conditions compared to the condition of interest. This is harder to interpret, but may help to pick up interactions that can also repress gene expression.
Here we will choose for setting ligand_activity_down = FALSE
and focus specifically on upregulating ligands.
ligand_activity_down = FALSE
Set the required contrast of interest.
Here, we want to compare on-treatment samples with pre-treatment samples for group E. Therefore we can set this contrast as:
contrasts_oi = c("'G1.T2-G1.T1','G1.T1-G1.T2'")
Very Important Note the format to indicate the contrasts! This formatting should be adhered to very strictly, and white spaces are not allowed! Check ?get_DE_info
for explanation about how to define this well. The most important points are that:
each contrast is surrounded by single quotation marks
contrasts are separated by a comma without any white space
*all contrasts together are surrounded by double quotation marks.
For downstream visualizations and linking contrasts to their main condition, we also need to run the following: This is necessary because we will also calculate cell-type+condition specificity of ligands and receptors.
contrast_tbl = tibble( contrast = c("G1.T2-G1.T1", "G1.T1-G1.T2"), group = c("G1.T2","G1.T1") )
Cell-type filtering: determine which cell types are sufficiently present
abundance_info = get_abundance_info( sce = sce, sample_id = group_id, group_id = group_id, celltype_id = celltype_id, min_cells = min_cells, senders_oi = senders_oi, receivers_oi = receivers_oi, batches = batches )
You see here that we set sample_id = group_id
. This is not a mistake. Why do we do this: we use the regular MultiNicheNet code to get cell type abundances per samples and groups. Because we will ignore sample-level information in this vignette, we will set this parameter to the condition/group ID and not the sample ID.
First, we will check the cell type abundance diagnostic plots.
The first plot visualizes the number of cells per celltype-condition combination, and indicates which combinations are removed during the DE analysis because there are less than min_cells
in the celltype-condition combination.
abundance_info$abund_plot_sample
The red dotted line indicates the required minimum of cells as defined above in min_cells
. We can see here that all cell types are present in all conditions.
In case this plot would indicate that not all cell types are present in all conditions: running the following block of code can help you determine which cell types are condition-specific and which cell types are absent.
sample_group_celltype_df = abundance_info$abundance_data %>% filter(n > min_cells) %>% ungroup() %>% distinct(sample_id, group_id) %>% cross_join( abundance_info$abundance_data %>% ungroup() %>% distinct(celltype_id) ) %>% arrange(sample_id) abundance_df = sample_group_celltype_df %>% left_join( abundance_info$abundance_data %>% ungroup() ) abundance_df$n[is.na(abundance_df$n)] = 0 abundance_df$keep[is.na(abundance_df$keep)] = FALSE abundance_df_summarized = abundance_df %>% mutate(keep = as.logical(keep)) %>% group_by(group_id, celltype_id) %>% summarise(samples_present = sum((keep))) celltypes_absent_one_condition = abundance_df_summarized %>% filter(samples_present == 0) %>% pull(celltype_id) %>% unique() # find truly condition-specific cell types by searching for cell types # truely absent in at least one condition celltypes_present_one_condition = abundance_df_summarized %>% filter(samples_present >= 1) %>% pull(celltype_id) %>% unique() # require presence in at least 1 samples of one group so # it is really present in at least one condition condition_specific_celltypes = intersect( celltypes_absent_one_condition, celltypes_present_one_condition) total_nr_conditions = SummarizedExperiment::colData(sce)[,group_id] %>% unique() %>% length() absent_celltypes = abundance_df_summarized %>% filter(samples_present < 1) %>% group_by(celltype_id) %>% count() %>% filter(n == total_nr_conditions) %>% pull(celltype_id) print("condition-specific celltypes:") print(condition_specific_celltypes) print("absent celltypes:") print(absent_celltypes)
Absent cell types and condition-specific cell types will be filtered out for this analysis.
senders_oi = senders_oi %>% setdiff(union(absent_celltypes, condition_specific_celltypes)) receivers_oi = receivers_oi %>% setdiff(union(absent_celltypes, condition_specific_celltypes)) sce = sce[, SummarizedExperiment::colData(sce)[,celltype_id] %in% c(senders_oi, receivers_oi) ]
Before running the DE analysis, we will determine which genes are not sufficiently expressed and should be filtered out.
We will perform gene filtering based on a similar procedure as used in edgeR::filterByExpr
. However, we adapted this procedure to be more interpretable for single-cell datasets.
frq_list = get_frac_exprs_sampleAgnostic( sce = sce, sample_id = sample_id, celltype_id = celltype_id, group_id = group_id, batches = batches, min_cells = min_cells, fraction_cutoff = fraction_cutoff, min_sample_prop = min_sample_prop)
Now only keep genes that are expressed by at least one cell type:
genes_oi = frq_list$expressed_df %>% filter(expressed == TRUE) %>% pull(gene) %>% unique() sce = sce[genes_oi, ]
After filtering out absent cell types and genes, we will continue the analysis by calculating the different prioritization criteria that we will use to prioritize cell-cell communication patterns.
First, we will determine and normalize per-condition pseudobulk expression levels for each expressed gene in each present cell type. The function process_abundance_expression_info
will link this expression information for ligands of the sender cell types to the corresponding receptors of the receiver cell types. This will later on allow us to define the cell-type specicificy criteria for ligands and receptors.
abundance_expression_info = process_abundance_expression_info( sce = sce, sample_id = group_id, group_id = group_id, celltype_id = celltype_id, min_cells = min_cells, senders_oi = senders_oi, receivers_oi = receivers_oi, lr_network = lr_network, batches = batches, frq_list = frq_list, abundance_info = abundance_info)
You see here that we again set sample_id = group_id
. This is not a mistake. Why do we do this: we use the regular MultiNicheNet code to get cell type abundances per samples and groups. Because we will ignore sample-level information in this vignette, we will set this parameter to the condition/group ID and not the sample ID.
Differential expression (DE) analysis: determine which genes are differentially expressed
In this step, we will perform genome-wide differential expression analysis of receiver and sender cell types to define DE genes between the conditions of interest (as formalized by the contrasts_oi
). Based on this analysis, we later can define the levels of differential expression of ligands in senders and receptors in receivers, and define the set of affected target genes in the receiver cell types (which will be used for the ligand activity analysis).
Because we don't have several samples per condition, we cannot apply pseudobulking followed by EdgeR as done in the regular MultiNicheNet workflow. Instead, we will here perform a classic FindMarkers approach.
DE_info_group1 = get_DE_info_sampleAgnostic( sce = sce, group_id = group_id, celltype_id = celltype_id, contrasts_oi = contrasts_oi, min_cells = min_cells, expressed_df = frq_list$expressed_df, contrast_tbl = contrast_tbl)
Check DE output information in table with logFC and p-values for each gene-celltype-contrast:
DE_info_group1$celltype_de_findmarkers %>% head()
Evaluate the distributions of p-values:
DE_info_group1$hist_pvals_findmarkers
celltype_de_group1 = DE_info_group1$celltype_de_findmarkers
Combine DE information for ligand-senders and receptors-receivers
sender_receiver_de = combine_sender_receiver_de( sender_de = celltype_de_group1, receiver_de = celltype_de_group1, senders_oi = senders_oi, receivers_oi = receivers_oi, lr_network = lr_network )
sender_receiver_de %>% head(20)
Ligand activity prediction: use the DE analysis output to predict the activity of ligands in receiver cell types and infer their potential target genes
We will first inspect the geneset_oi-vs-background ratios for the default tresholds:
geneset_assessment = contrast_tbl$contrast %>% lapply( process_geneset_data, celltype_de_group1, logFC_threshold, p_val_adj, p_val_threshold ) %>% bind_rows() geneset_assessment
We can see here that for most cell type / contrast combinations, most geneset/background ratio's are NOT within the recommended range (in_range_up
and in_range_down
columns). Therefore, it could be useful to use more lenient thresholds (for these or all cell types).
logFC_threshold = 0.125 p_val_threshold = 0.05
p_val_adj = TRUE
geneset_assessment = contrast_tbl$contrast %>% lapply( process_geneset_data, celltype_de_group1, logFC_threshold, p_val_adj, p_val_threshold ) %>% bind_rows() geneset_assessment
This looks better.
Now we can run the ligand activity analysis: (this will take some time (the more cell types and contrasts, the more time))
ligand_activities_targets_DEgenes = suppressMessages(suppressWarnings( get_ligand_activities_targets_DEgenes( receiver_de = celltype_de_group1, receivers_oi = intersect(receivers_oi, celltype_de_group1$cluster_id %>% unique()), ligand_target_matrix = ligand_target_matrix, logFC_threshold = logFC_threshold, p_val_threshold = p_val_threshold, p_val_adj = p_val_adj, top_n_target = top_n_target, verbose = TRUE, n.cores = n.cores ) ))
Prioritization: rank cell-cell communication patterns through multi-criteria prioritization
sender_receiver_tbl = sender_receiver_de %>% distinct(sender, receiver) metadata_combined = SummarizedExperiment::colData(sce) %>% tibble::as_tibble() if(!is.na(batches)){ grouping_tbl = metadata_combined[,c(group_id, batches)] %>% tibble::as_tibble() %>% dplyr::distinct() colnames(grouping_tbl) = c("group",batches) grouping_tbl = grouping_tbl %>% mutate(sample = group) grouping_tbl = grouping_tbl %>% tibble::as_tibble() } else { grouping_tbl = metadata_combined[,c(group_id)] %>% tibble::as_tibble() %>% dplyr::distinct() colnames(grouping_tbl) = c("group") grouping_tbl = grouping_tbl %>% mutate(sample = group) %>% select(sample, group) } prioritization_tables = suppressMessages(generate_prioritization_tables( sender_receiver_info = abundance_expression_info$sender_receiver_info, sender_receiver_de = sender_receiver_de, ligand_activities_targets_DEgenes = ligand_activities_targets_DEgenes, contrast_tbl = contrast_tbl, sender_receiver_tbl = sender_receiver_tbl, grouping_tbl = grouping_tbl, scenario = scenario, fraction_cutoff = fraction_cutoff, abundance_data_receiver = abundance_expression_info$abundance_data_receiver, abundance_data_sender = abundance_expression_info$abundance_data_sender, ligand_activity_down = ligand_activity_down ))
Compile the MultiNicheNet output object
multinichenet_output_group1_t2vst1 = list( celltype_info = abundance_expression_info$celltype_info, celltype_de = celltype_de_group1, sender_receiver_info = abundance_expression_info$sender_receiver_info, sender_receiver_de = sender_receiver_de, ligand_activities_targets_DEgenes = ligand_activities_targets_DEgenes, prioritization_tables = prioritization_tables, grouping_tbl = grouping_tbl, lr_target_prior_cor = tibble() ) multinichenet_output_group1_t2vst1 = make_lite_output(multinichenet_output_group1_t2vst1)
Summarizing ChordDiagram circos plots
In a first instance, we will look at the broad overview of prioritized interactions via condition-specific Chordiagram circos plots.
We will look here at the top 50 predictions across all contrasts, senders, and receivers of interest.
prioritized_tbl_oi_all = get_top_n_lr_pairs( multinichenet_output_group1_t2vst1$prioritization_tables, top_n = 50, rank_per_group = FALSE )
prioritized_tbl_oi = multinichenet_output_group1_t2vst1$prioritization_tables$group_prioritization_tbl %>% filter(id %in% prioritized_tbl_oi_all$id) %>% distinct(id, sender, receiver, ligand, receptor, group) %>% left_join(prioritized_tbl_oi_all) prioritized_tbl_oi$prioritization_score[is.na(prioritized_tbl_oi$prioritization_score)] = 0 senders_receivers = union(prioritized_tbl_oi$sender %>% unique(), prioritized_tbl_oi$receiver %>% unique()) %>% sort() colors_sender = RColorBrewer::brewer.pal(n = length(senders_receivers), name = 'Spectral') %>% magrittr::set_names(senders_receivers) colors_receiver = RColorBrewer::brewer.pal(n = length(senders_receivers), name = 'Spectral') %>% magrittr::set_names(senders_receivers) circos_list = make_circos_group_comparison(prioritized_tbl_oi, colors_sender, colors_receiver)
Interpretable bubble plots
Whereas these ChordDiagrams show the most specific interactions per group, they don't give insights into the data behind these predictions. Therefore we will now look at visualizations that indicate the different prioritization criteria used in MultiNicheNet.
In the next type of plots, we will 1) visualize the per-sample scaled product of normalized ligand and receptor pseudobulk expression, 2) visualize the scaled ligand activities, and 3) cell-type specificity.
We will now check the top interactions specific for the OnE group (G1.T2) - so these are the interactions that increase during anti-PD1 therapy.
group_oi = "G1.T2"
prioritized_tbl_oi_50 = get_top_n_lr_pairs( multinichenet_output_group1_t2vst1$prioritization_tables, top_n = 50, rank_per_group = FALSE) %>% filter(group == group_oi)
prioritized_tbl_oi_50_omnipath = prioritized_tbl_oi_50 %>% inner_join(lr_network_all)
Now we add this to the bubble plot visualization:
plot_oi = make_sample_lr_prod_activity_plots_Omnipath( multinichenet_output_group1_t2vst1$prioritization_tables, prioritized_tbl_oi_50_omnipath) plot_oi
Interestingly, we can already see that some of these interactions are only increasing in the E group, whereas others change in the NE group as well!
In practice, we always recommend generating these plots for all conditions/contrasts. To avoid that this vignette becomes too long, we will not do this here.
Now let's do the same analysis for the NE group instead
Set the required contrast of interest.
Here, we want to compare on-treatment samples with pre-treatment samples for group NE. Therefore we can set this contrast as:
contrasts_oi = c("'G2.T2-G2.T1','G2.T1-G2.T2'")
contrast_tbl = tibble( contrast = c("G2.T2-G2.T1", "G2.T1-G2.T2"), group = c("G2.T2","G2.T1") )
In this case, the first 3 steps (cell-type filtering, gene filtering, and pseudobulk expression calculation) were run for the entire dataset and are exactly the same now as for the first MultiNicheNet analysis. Therefore, we will not redo these steps, and just start with step 4, the first unique step for this second analysis.
Differential expression (DE) analysis: determine which genes are differentially expressed
In this step, we will perform genome-wide differential expression analysis of receiver and sender cell types to define DE genes between the conditions of interest (as formalized by the contrasts_oi
). Based on this analysis, we later can define the levels of differential expression of ligands in senders and receptors in receivers, and define the set of affected target genes in the receiver cell types (which will be used for the ligand activity analysis).
Because we don't have several samples per condition, we cannot apply pseudobulking followed by EdgeR as done in the regular MultiNicheNet workflow. Instead, we will here perform a classic FindMarkers approach.
DE_info_group2 = get_DE_info_sampleAgnostic( sce = sce, group_id = group_id, celltype_id = celltype_id, contrasts_oi = contrasts_oi, min_cells = min_cells, expressed_df = frq_list$expressed_df, contrast_tbl = contrast_tbl)
Check DE output information in table with logFC and p-values for each gene-celltype-contrast:
DE_info_group2$celltype_de_findmarkers %>% head()
Evaluate the distributions of p-values:
DE_info_group2$hist_pvals_findmarkers
celltype_de_group2 = DE_info_group2$celltype_de_findmarkers
Combine DE information for ligand-senders and receptors-receivers
sender_receiver_de = combine_sender_receiver_de( sender_de = celltype_de_group2, receiver_de = celltype_de_group2, senders_oi = senders_oi, receivers_oi = receivers_oi, lr_network = lr_network )
sender_receiver_de %>% head(20)
Ligand activity prediction: use the DE analysis output to predict the activity of ligands in receiver cell types and infer their potential target genes
We will first inspect the geneset_oi-vs-background ratios for the default tresholds:
logFC_threshold = 0.25 p_val_threshold = 0.05
p_val_adj = TRUE
geneset_assessment = contrast_tbl$contrast %>% lapply( process_geneset_data, celltype_de_group2, logFC_threshold, p_val_adj, p_val_threshold ) %>% bind_rows() geneset_assessment
We can see here that for most cell type / contrast combinations, most geneset/background ratio's are NOT within the recommended range (in_range_up
and in_range_down
columns). Therefore, it could be useful to use more lenient thresholds (for these or all cell types).
logFC_threshold = 0.125 p_val_threshold = 0.05
p_val_adj = TRUE
geneset_assessment = contrast_tbl$contrast %>% lapply( process_geneset_data, celltype_de_group2, logFC_threshold, p_val_adj, p_val_threshold ) %>% bind_rows() geneset_assessment
This looks better.
Now we can run the ligand activity analysis: (this will take some time (the more cell types and contrasts, the more time))
ligand_activities_targets_DEgenes = suppressMessages(suppressWarnings( get_ligand_activities_targets_DEgenes( receiver_de = celltype_de_group2, receivers_oi = intersect(receivers_oi, celltype_de_group2$cluster_id %>% unique()), ligand_target_matrix = ligand_target_matrix, logFC_threshold = logFC_threshold, p_val_threshold = p_val_threshold, p_val_adj = p_val_adj, top_n_target = top_n_target, verbose = TRUE, n.cores = n.cores ) ))
Prioritization: rank cell-cell communication patterns through multi-criteria prioritization
sender_receiver_tbl = sender_receiver_de %>% distinct(sender, receiver) metadata_combined = SummarizedExperiment::colData(sce) %>% tibble::as_tibble() if(!is.na(batches)){ grouping_tbl = metadata_combined[,c(group_id, batches)] %>% tibble::as_tibble() %>% dplyr::distinct() colnames(grouping_tbl) = c("group",batches) grouping_tbl = grouping_tbl %>% mutate(sample = group) grouping_tbl = grouping_tbl %>% tibble::as_tibble() } else { grouping_tbl = metadata_combined[,c(group_id)] %>% tibble::as_tibble() %>% dplyr::distinct() colnames(grouping_tbl) = c("group") grouping_tbl = grouping_tbl %>% mutate(sample = group) %>% select(sample, group) } prioritization_tables = suppressMessages(generate_prioritization_tables( sender_receiver_info = abundance_expression_info$sender_receiver_info, sender_receiver_de = sender_receiver_de, ligand_activities_targets_DEgenes = ligand_activities_targets_DEgenes, contrast_tbl = contrast_tbl, sender_receiver_tbl = sender_receiver_tbl, grouping_tbl = grouping_tbl, scenario = scenario, fraction_cutoff = fraction_cutoff, abundance_data_receiver = abundance_expression_info$abundance_data_receiver, abundance_data_sender = abundance_expression_info$abundance_data_sender, ligand_activity_down = ligand_activity_down ))
Compile the MultiNicheNet output object
multinichenet_output_group2_t2vst1 = list( celltype_info = abundance_expression_info$celltype_info, celltype_de = celltype_de_group2, sender_receiver_info = abundance_expression_info$sender_receiver_info, sender_receiver_de = sender_receiver_de, ligand_activities_targets_DEgenes = ligand_activities_targets_DEgenes, prioritization_tables = prioritization_tables, grouping_tbl = grouping_tbl, lr_target_prior_cor = tibble() ) multinichenet_output_group2_t2vst1 = make_lite_output(multinichenet_output_group2_t2vst1)
Summarizing ChordDiagram circos plots
In a first instance, we will look at the broad overview of prioritized interactions via condition-specific Chordiagram circos plots.
We will look here at the top 50 predictions across all contrasts, senders, and receivers of interest.
prioritized_tbl_oi_all = get_top_n_lr_pairs( multinichenet_output_group2_t2vst1$prioritization_tables, top_n = 50, rank_per_group = FALSE )
prioritized_tbl_oi = multinichenet_output_group2_t2vst1$prioritization_tables$group_prioritization_tbl %>% filter(id %in% prioritized_tbl_oi_all$id) %>% distinct(id, sender, receiver, ligand, receptor, group) %>% left_join(prioritized_tbl_oi_all) prioritized_tbl_oi$prioritization_score[is.na(prioritized_tbl_oi$prioritization_score)] = 0 senders_receivers = union(prioritized_tbl_oi$sender %>% unique(), prioritized_tbl_oi$receiver %>% unique()) %>% sort() colors_sender = RColorBrewer::brewer.pal(n = length(senders_receivers), name = 'Spectral') %>% magrittr::set_names(senders_receivers) colors_receiver = RColorBrewer::brewer.pal(n = length(senders_receivers), name = 'Spectral') %>% magrittr::set_names(senders_receivers) circos_list = make_circos_group_comparison(prioritized_tbl_oi, colors_sender, colors_receiver)
Interpretable bubble plots
Whereas these ChordDiagrams show the most specific interactions per group, they don't give insights into the data behind these predictions. Therefore we will now look at visualizations that indicate the different prioritization criteria used in MultiNicheNet.
In the next type of plots, we will 1) visualize the per-sample scaled product of normalized ligand and receptor pseudobulk expression, 2) visualize the scaled ligand activities, and 3) cell-type specificity.
We will now check the top interactions specific for the OnNE group (G2.T2) - so these are the interactions that increase during anti-PD1 therapy.
group_oi = "G2.T2"
prioritized_tbl_oi_50 = get_top_n_lr_pairs( multinichenet_output_group2_t2vst1$prioritization_tables, top_n = 50, rank_per_group = FALSE) %>% filter(group == group_oi)
prioritized_tbl_oi_50_omnipath = prioritized_tbl_oi_50 %>% inner_join(lr_network_all)
Now we add this to the bubble plot visualization:
plot_oi = make_sample_lr_prod_activity_plots_Omnipath( multinichenet_output_group2_t2vst1$prioritization_tables, prioritized_tbl_oi_50_omnipath) plot_oi
Interestingly, we can already see that some of these interactions are only increasing in the NE group, whereas others change in the E group as well (or even more strongly)!
In practice, we always recommend generating these plots for all conditions/contrasts. To avoid that this vignette becomes too long, we will not do this here.
So interesting observation: even though the NE group is characterized by a lack of T cell clonotype expansion after therapy, there are still clear therapy-induced differences on gene expression and cell-cell communication.
Interestingly, we can see here as well that some interactions are only increasing in the NE group, whereas others change in the E group as well!
This brings us to the next research question: do some interactions increase/decrease more in the E group vs the NE group or vice versa?
To address this question, we will first do a simple analysis. We will just compare the prioritization scores between the E and the NE group, to find CCC patterns that are most different between the E and NE group.
Set up the comparisons
comparison_table = multinichenet_output_group1_t2vst1$prioritization_tables$group_prioritization_tbl %>% select(group, id, prioritization_score) %>% rename(prioritization_score_group1 = prioritization_score) %>% mutate(group = case_when( group == "G1.T1" ~ "timepoint1", group == "G1.T2" ~ "timepoint2", TRUE ~ group) ) %>% inner_join( multinichenet_output_group2_t2vst1$prioritization_tables$group_prioritization_tbl %>% select(group, id, prioritization_score) %>% rename(prioritization_score_group2 = prioritization_score) %>% mutate(group = case_when( group == "G2.T1" ~ "timepoint1", group == "G2.T2" ~ "timepoint2", TRUE ~ group) ) ) %>% mutate(difference_group1_vs_group2 = prioritization_score_group1 - prioritization_score_group2)
Let's now inspect some of the top interactions that got much higher scores in the E group
comparison_table %>% arrange(-difference_group1_vs_group2) %>% filter(prioritization_score_group1 > 0.80)
Let's now inspect some of the top interactions that got much higher scores in the NE group
comparison_table %>% arrange(difference_group1_vs_group2) %>% filter(prioritization_score_group2 > 0.80)
Visualize now interactions that are in the top250 interactions for the contrast On-vs-Pre in the E group.
E-group specific
Of these, we will first zoom into the top25 interactions with biggest difference in the E-vs-NE group.
group_oi = "G1.T2"
prioritized_tbl_oi_250 = get_top_n_lr_pairs( multinichenet_output_group1_t2vst1$prioritization_tables, top_n = 250, rank_per_group = FALSE) %>% filter(group == group_oi) %>% inner_join(lr_network_all)
ids_group1_vs_group2 = comparison_table %>% filter(group == "timepoint2" & id %in% prioritized_tbl_oi_250$id) %>% top_n(25, difference_group1_vs_group2) %>% filter(difference_group1_vs_group2 > 0) %>% pull(id)
prioritized_tbl_oi_250_unique_E = prioritized_tbl_oi_250 %>% filter(id %in% ids_group1_vs_group2)
plot_oi = make_sample_lr_prod_activity_plots_Omnipath( multinichenet_output_group1_t2vst1$prioritization_tables, prioritized_tbl_oi_250_unique_E) plot_oi
shared between E-group and NE-group
Now we will look at the interactions with a very similar score between E and NE. So these interactions are interactions that increase after therapy, but shared between group E and NE.
ids_E_shared_NE = comparison_table %>% filter(group == "timepoint2" & id %in% prioritized_tbl_oi_250$id) %>% mutate(deviation_from_0 = abs(0-difference_group1_vs_group2)) %>% top_n(25, -deviation_from_0) %>% arrange(deviation_from_0) %>% pull(id)
prioritized_tbl_oi_250_shared_E_NE = prioritized_tbl_oi_250 %>% filter(id %in% ids_E_shared_NE)
plot_oi = make_sample_lr_prod_activity_plots_Omnipath( multinichenet_output_group1_t2vst1$prioritization_tables, prioritized_tbl_oi_250_shared_E_NE) plot_oi
Visualize now interactions that are in the top250 interactions for the contrast On-vs-Pre in the E group. We will focus on decreasing interactions here.
E-group specific
Of these, we will first zoom into the top10 interactions with biggest difference in the E-vs-NE group.
group_oi = "G1.T1"
prioritized_tbl_oi_250 = get_top_n_lr_pairs( multinichenet_output_group1_t2vst1$prioritization_tables, top_n = 250, rank_per_group = FALSE) %>% filter(group == group_oi) %>% inner_join(lr_network_all)
ids_group1_vs_group2 = comparison_table %>% filter(group == "timepoint1" & id %in% prioritized_tbl_oi_250$id) %>% top_n(10, difference_group1_vs_group2) %>% filter(difference_group1_vs_group2 > 0) %>% pull(id)
prioritized_tbl_oi_250_unique_E = prioritized_tbl_oi_250 %>% filter(id %in% ids_group1_vs_group2)
plot_oi = make_sample_lr_prod_activity_plots_Omnipath( multinichenet_output_group1_t2vst1$prioritization_tables, prioritized_tbl_oi_250_unique_E) plot_oi
shared between E-group and NE-group
Now we will look at the interactions with a very similar score between E and NE. So these interactions are interactions that decrease after therapy, but shared between group E and NE.
ids_E_shared_NE = comparison_table %>% filter(group == "timepoint1" & id %in% prioritized_tbl_oi_250$id) %>% mutate(deviation_from_0 = abs(0-difference_group1_vs_group2)) %>% top_n(10, -deviation_from_0) %>% arrange(deviation_from_0) %>% pull(id)
prioritized_tbl_oi_250_shared_E_NE = prioritized_tbl_oi_250 %>% filter(id %in% ids_E_shared_NE)
plot_oi = make_sample_lr_prod_activity_plots_Omnipath( multinichenet_output_group1_t2vst1$prioritization_tables, prioritized_tbl_oi_250_shared_E_NE) plot_oi
Visualize now interactions that are in the top250 interactions for the contrast On-vs-Pre in the E group.
To avoid making this vignette too long, we will not explicitly show this for the NE group since the code is the same as for the E group, you only need to change the group_oi
to "G2.T2"
.
Whereas these comparisons are quite intuitive, they are also relatively arbitrary (requiring cutoffs to compare interactions) and suboptimal. They are suboptimal because the prioritization scores of interactions are relative versus the other interactions within a group. If there is a difference in effect size of the therapy-induced changes in CCC, comparing the final prioritization scores may not be optimal. This is something we saw in these previous plots.
Therefore, we will now discuss another way to infer group-specific therapy-associated CCC patterns: namely looking explicitly at a multifactorial contrast.
Because the DE analysis is here done in a classic FindMarkers approach, we cannot perform DE on multifactorial experimental designs. You can only compare one group vs other group(s). Therefore, we propose here the following solution for a multifactorial design:
"Calculate the between group difference between the time-difference logFC values"
celltype_de_group1 %>% arrange(-logFC) %>% head() celltype_de_group2 %>% arrange(-logFC) %>% head()
Now step 2: Calculate the between group difference between the time-difference logFC values + save the minimal p-value (if a gene was significant for one contrast, keep track of that)
celltype_de_TimeDiff_On_vs_Pre = inner_join( celltype_de_group1 %>% filter(contrast == "G1.T2-G1.T1") %>% rename(logFC_G1 = logFC, p_val_G1 = p_val, p_adj_G1 = p_adj) %>% distinct(gene, cluster_id, logFC_G1, p_val_G1, p_adj_G1), celltype_de_group2 %>% filter(contrast == "G2.T2-G2.T1") %>% rename(logFC_G2 = logFC, p_val_G2 = p_val, p_adj_G2 = p_adj) %>% distinct(gene, cluster_id, logFC_G2, p_val_G2, p_adj_G2) ) celltype_de_TimeDiff_Pre_vs_On = inner_join( celltype_de_group1 %>% filter(contrast == "G1.T1-G1.T2") %>% rename(logFC_G1 = logFC, p_val_G1 = p_val, p_adj_G1 = p_adj) %>% distinct(gene, cluster_id, logFC_G1, p_val_G1, p_adj_G1), celltype_de_group2 %>% filter(contrast == "G2.T1-G2.T2") %>% rename(logFC_G2 = logFC, p_val_G2 = p_val, p_adj_G2 = p_adj) %>% distinct(gene, cluster_id, logFC_G2, p_val_G2, p_adj_G2) ) celltype_de = bind_rows( celltype_de_TimeDiff_On_vs_Pre %>% mutate(logFC = logFC_G1 - logFC_G2, p_val = pmin(p_val_G1, p_val_G2), p_adj = pmin(p_adj_G1, p_adj_G2)) %>% mutate(contrast = "(G1.T2-G1.T1)-(G2.T2-G2.T1)"), celltype_de_TimeDiff_On_vs_Pre %>% mutate(logFC = logFC_G2 - logFC_G1, p_val = pmin(p_val_G1, p_val_G2), p_adj = pmin(p_adj_G1, p_adj_G2)) %>% mutate(contrast = "(G2.T2-G2.T1)-(G1.T2-G1.T1)"), celltype_de_TimeDiff_Pre_vs_On %>% mutate(logFC = logFC_G1 - logFC_G2, p_val = pmin(p_val_G1, p_val_G2), p_adj = pmin(p_adj_G1, p_adj_G2)) %>% mutate(contrast = "(G1.T1-G1.T2)-(G2.T1-G2.T2)"), celltype_de_TimeDiff_Pre_vs_On %>% mutate(logFC = logFC_G2 - logFC_G1, p_val = pmin(p_val_G1, p_val_G2), p_adj = pmin(p_adj_G1, p_adj_G2)) %>% mutate(contrast = "(G2.T1-G2.T2)-(G1.T1-G1.T2)") )
Just know that by doing this, we may have some positive "logFC values" for genes that have a lower decrease in one group vs the other: eg:
celltype_de %>% filter(logFC_G1 < 0 & logFC > 0 & contrast == "(G1.T2-G1.T1)-(G2.T2-G2.T1)") %>% arrange(-logFC) %>% head(5)
So important to summarize the interpretation of logFC values (= group-difference between time-difference logFC values) per contrast: * (G1.T2-G1.T1)-(G2.T2-G2.T1): positive logFC if: 1) increase after treatment is bigger in group E than group NE 2) decrease after treatment is smaller in group E than group NE
Therefore these logFC values are the same as for this contrast: * (G2.T1-G2.T2)-(G1.T1-G1.T2): positive logFC if: 1) decrease after treatment is bigger in group NE than in group E 2) increase after treatment is smaller in group NE than group E
Ligand activity prediction: use the DE analysis output to predict the activity of ligands in receiver cell types and infer their potential target genes
Here we get to the most complex part of this vignette.
Let's look at the first contrasts of interest: (G1.T2-G1.T1)-(G2.T2-G2.T1)
. As written before: a positive logFC value here means that the timepoint2-timepoint1 difference in group1 is bigger than in group2. This can be because the increase after therapy (resulting in higher G1.T2 values) is higher in group1 than in the group2. This is the type of trend we are looking for. However, a positive logFC will also be returned in case nothing happens in group1, and there is a therapy-associated decrease in group2. Or when the therapy-associated decrease in group2 is bigger than in group1. This latter examples are trends whe are not looking for.
If we would just filter genes based on logFC to get a geneset of interest, we will confound this geneset with different patterns with different biological meanings.
To recap: we are interested in: ligand-receptor interactions that increase after therapy more strongly in group1 vs group2 target genes that follow the trend of their upstream ligand-receptor interactions and thus also increase after therapy more strongly in group1 vs group2.
How to get to these genesets:
1) Change the celltype_de
table of DE results:
for the contrast (G1.T2-G1.T1)-(G2.T2-G2.T1): give all genes with a pos logFC value a value of 0 if their logFC in the timepoint2-timepoint1 comparison (kept in celltype_de_group1
) is smaller than (logFC_threshold
). Give all genes with a negative logFC a value of 0. As a result, the geneset for ligand activity analysis will only contain genes that are increasing after therapy, and this increase in stronger in group1 than group2.
for the contrast (G2.T2-G2.T1)-(G1.T2-G1.T1): give all genes with a pos logFC value a value of 0 if their logFC in the timepoint2-timepoint1 comparison (kept in celltype_de_group2
) is smaller than (logFC_threshold
). Give all genes with a negative logFC a value of 0. As a result, the geneset for ligand activity analysis will only contain genes that are increasing after therapy, and this increase in stronger in group2 than group1.
for the contrast (G1.T1-G1.T2)-(G2.T1-G2.T2): give all genes with a pos logFC value a value of 0 if their logFC in the timepoint1-timepoint2 comparison (kept in celltype_de_group1
) is smaller than (logFC_threshold
). Give all genes with a negative logFC a value of 0. As a result, the geneset for ligand activity analysis will only contain genes that are decreasing after therapy, and this decrease in stronger in group1 than group1.
for the contrast (G2.T1-G2.T2)-(G1.T1-G1.T2): give all genes with a pos logFC value a value of 0 if their logFC in the timepoint1-timepoint2 comparison (kept in celltype_de_group2
) is smaller than (logFC_threshold
). Give all genes with a negative logFC a value of 0. As a result, the geneset for ligand activity analysis will only contain genes that are decreasing after therapy, and this decrease in stronger in group2 than group1.
Because we want to keep the same patterns for the ligands/receptors: we will do the same but in more permissive setting: just make sure the On/timepoint2-vs-Pre/timepoint1 difference is in the expected direction without needing to be significant.
Gene-celltype table with genes that will keep their pos logFC value for contrast (G1.T2-G1.T1)-(G2.T2-G2.T1) + modified celltype_de table
gene_celltype_tbl_increase_group1_permissive = celltype_de_group1 %>% filter(contrast == "G1.T2-G1.T1") %>% mutate(logFC_multiplier = ifelse(logFC > 0, 1, 0)) %>% distinct(gene, cluster_id, logFC_multiplier) gene_celltype_tbl_increase_group1_stringent = celltype_de_group1 %>% filter(contrast == "G1.T2-G1.T1") %>% mutate(logFC_multiplier = ifelse(logFC >= logFC_threshold & p_val <= p_val_threshold, 1, 0)) %>% distinct(gene, cluster_id, logFC_multiplier)
celltype_de_increase_group1_permissive = celltype_de %>% filter(contrast == "(G1.T2-G1.T1)-(G2.T2-G2.T1)") %>% inner_join(gene_celltype_tbl_increase_group1_permissive) celltype_de_increase_group1_permissive = bind_rows( celltype_de_increase_group1_permissive %>% filter(logFC > 0) %>% mutate(logFC = logFC*logFC_multiplier), celltype_de_increase_group1_permissive %>% filter(logFC <= 0) %>% mutate(logFC = logFC*0) ) %>% arrange(-logFC) celltype_de_increase_group1_stringent = celltype_de %>% filter(contrast == "(G1.T2-G1.T1)-(G2.T2-G2.T1)") %>% inner_join(gene_celltype_tbl_increase_group1_stringent) celltype_de_increase_group1_stringent = bind_rows( celltype_de_increase_group1_stringent %>% filter(logFC > 0) %>% mutate(logFC = logFC*logFC_multiplier), celltype_de_increase_group1_stringent %>% filter(logFC <= 0) %>% mutate(logFC = logFC*0) ) %>% arrange(-logFC) celltype_de_increase_group1_stringent %>% head()
Gene-celltype table with genes that will keep their pos logFC value for contrast (G2.T2-G2.T1)-(G1.T2-G1.T1):
gene_celltype_tbl_increase_group2_permissive = celltype_de_group2 %>% filter(contrast == "G2.T2-G2.T1") %>% mutate(logFC_multiplier = ifelse(logFC > 0, 1, 0)) %>% distinct(gene, cluster_id, logFC_multiplier) gene_celltype_tbl_increase_group2_stringent = celltype_de_group2 %>% filter(contrast == "G2.T2-G2.T1") %>% mutate(logFC_multiplier = ifelse(logFC >= logFC_threshold & p_val <= p_val_threshold, 1, 0)) %>% distinct(gene, cluster_id, logFC_multiplier)
celltype_de_increase_group2_permissive = celltype_de %>% filter(contrast == "(G2.T2-G2.T1)-(G1.T2-G1.T1)") %>% inner_join(gene_celltype_tbl_increase_group2_permissive) celltype_de_increase_group2_permissive = bind_rows( celltype_de_increase_group2_permissive %>% filter(logFC > 0) %>% mutate(logFC = logFC*logFC_multiplier), celltype_de_increase_group2_permissive %>% filter(logFC <= 0) %>% mutate(logFC = logFC*0) ) %>% arrange(-logFC) celltype_de_increase_group2_stringent = celltype_de %>% filter(contrast == "(G2.T2-G2.T1)-(G1.T2-G1.T1)") %>% inner_join(gene_celltype_tbl_increase_group2_stringent) celltype_de_increase_group2_stringent = bind_rows( celltype_de_increase_group2_stringent %>% filter(logFC > 0) %>% mutate(logFC = logFC*logFC_multiplier), celltype_de_increase_group2_stringent %>% filter(logFC <= 0) %>% mutate(logFC = logFC*0) ) %>% arrange(-logFC) celltype_de_increase_group2_stringent %>% head()
Gene-celltype table with genes that will keep their pos logFC value for contrast (G1.T1-G1.T2)-(G2.T1-G2.T2):
gene_celltype_tbl_decrease_group1_permissive = celltype_de_group1 %>% filter(contrast == "G1.T1-G1.T2") %>% mutate(logFC_multiplier = ifelse(logFC > 0, 1, 0)) %>% distinct(gene, cluster_id, logFC_multiplier) gene_celltype_tbl_decrease_group1_stringent = celltype_de_group1 %>% filter(contrast == "G1.T1-G1.T2") %>% mutate(logFC_multiplier = ifelse(logFC >= logFC_threshold & p_val <= p_val_threshold, 1, 0)) %>% distinct(gene, cluster_id, logFC_multiplier)
celltype_de_decrease_group1_permissive = celltype_de %>% filter(contrast == "(G1.T1-G1.T2)-(G2.T1-G2.T2)") %>% inner_join(gene_celltype_tbl_decrease_group1_permissive) celltype_de_decrease_group1_permissive = bind_rows( celltype_de_decrease_group1_permissive %>% filter(logFC > 0) %>% mutate(logFC = logFC*logFC_multiplier), celltype_de_decrease_group1_permissive %>% filter(logFC <= 0) %>% mutate(logFC = logFC*0) ) %>% arrange(-logFC) celltype_de_decrease_group1_stringent = celltype_de %>% filter(contrast == "(G1.T1-G1.T2)-(G2.T1-G2.T2)") %>% inner_join(gene_celltype_tbl_decrease_group1_stringent) celltype_de_decrease_group1_stringent = bind_rows( celltype_de_decrease_group1_stringent %>% filter(logFC > 0) %>% mutate(logFC = logFC*logFC_multiplier), celltype_de_decrease_group1_stringent %>% filter(logFC <= 0) %>% mutate(logFC = logFC*0) ) %>% arrange(-logFC) celltype_de_decrease_group1_stringent %>% head()
Gene-celltype table with genes that will keep their pos logFC value for contrast (G2.T1-G2.T2)-(G1.T1-G1.T2):
gene_celltype_tbl_decrease_group2_permissive = celltype_de_group2 %>% filter(contrast == "G2.T1-G2.T2") %>% mutate(logFC_multiplier = ifelse(logFC > 0, 1, 0)) %>% distinct(gene, cluster_id, logFC_multiplier) gene_celltype_tbl_decrease_group2_stringent = celltype_de_group2 %>% filter(contrast == "G2.T1-G2.T2") %>% mutate(logFC_multiplier = ifelse(logFC >= logFC_threshold & p_val <= p_val_threshold, 1, 0)) %>% distinct(gene, cluster_id, logFC_multiplier)
celltype_de_decrease_group2_permissive = celltype_de %>% filter(contrast == "(G2.T1-G2.T2)-(G1.T1-G1.T2)") %>% inner_join(gene_celltype_tbl_decrease_group2_permissive) celltype_de_decrease_group2_permissive = bind_rows( celltype_de_decrease_group2_permissive %>% filter(logFC > 0) %>% mutate(logFC = logFC*logFC_multiplier), celltype_de_decrease_group2_permissive %>% filter(logFC <= 0) %>% mutate(logFC = logFC*0) ) %>% arrange(-logFC) celltype_de_decrease_group2_stringent = celltype_de %>% filter(contrast == "(G2.T1-G2.T2)-(G1.T1-G1.T2)") %>% inner_join(gene_celltype_tbl_decrease_group2_stringent) celltype_de_decrease_group2_stringent = bind_rows( celltype_de_decrease_group2_stringent %>% filter(logFC > 0) %>% mutate(logFC = logFC*logFC_multiplier), celltype_de_decrease_group2_stringent %>% filter(logFC <= 0) %>% mutate(logFC = logFC*0) ) %>% arrange(-logFC) celltype_de_decrease_group2_stringent %>% head()
So let's now make the new cellype_de
tables:
The permissive table that will be used for the prioritization of ligand-receptor pairs based on DE values:
celltype_de_LR = list( celltype_de_increase_group1_permissive, celltype_de_increase_group2_permissive, celltype_de_decrease_group1_permissive, celltype_de_decrease_group2_permissive) %>% bind_rows()
The stringent table that will be used for the creation of genesets for the ligand activity analysis:
celltype_de_genesets = list( celltype_de_increase_group1_stringent, celltype_de_increase_group2_stringent, celltype_de_decrease_group1_stringent, celltype_de_decrease_group2_stringent) %>% bind_rows()
We will first inspect the geneset_oi-vs-background ratios for the tresholds appropriate for this "multifactorial-like analysis":
p_val_threshold = 1 # because the p-values in this setting do not make sense logFC_threshold = 0.125 # we can be a bit more lenient here
contrast_tbl = tibble(contrast = c("(G1.T2-G1.T1)-(G2.T2-G2.T1)", "(G2.T2-G2.T1)-(G1.T2-G1.T1)", "(G1.T1-G1.T2)-(G2.T1-G2.T2)", "(G2.T1-G2.T2)-(G1.T1-G1.T2)"), group = c("G1.T2","G2.T2","G1.T1","G2.T1"))
geneset_assessment = contrast_tbl$contrast %>% lapply( process_geneset_data, celltype_de_genesets, logFC_threshold, p_val_adj, p_val_threshold ) %>% bind_rows() geneset_assessment
We can see here that for most geneset/background ratio's we are within or close to the recommended range for the upregulated genes (in_range_up
and in_range_down
columns).
ligand_activities_targets_DEgenes = suppressMessages(suppressWarnings( get_ligand_activities_targets_DEgenes( receiver_de = celltype_de_genesets, receivers_oi = intersect(receivers_oi, celltype_de_genesets$cluster_id %>% unique()), ligand_target_matrix = ligand_target_matrix, logFC_threshold = logFC_threshold, p_val_threshold = p_val_threshold, p_val_adj = p_val_adj, top_n_target = top_n_target, verbose = TRUE, n.cores = n.cores ) ))
Combine DE information for ligand-senders and receptors-receivers
Use the adapted celltype_de_LR
table to keep the expression change patterns we want:
sender_receiver_de = combine_sender_receiver_de( sender_de = celltype_de_LR, receiver_de = celltype_de_LR, senders_oi = senders_oi, receivers_oi = receivers_oi, lr_network = lr_network )
sender_receiver_de %>% head(20)
Prioritization: rank cell-cell communication patterns through multi-criteria prioritization
sender_receiver_tbl = sender_receiver_de %>% distinct(sender, receiver) metadata_combined = SummarizedExperiment::colData(sce) %>% tibble::as_tibble() if(!is.na(batches)){ grouping_tbl = metadata_combined[,c(group_id, batches)] %>% tibble::as_tibble() %>% dplyr::distinct() colnames(grouping_tbl) = c("group",batches) grouping_tbl = grouping_tbl %>% mutate(sample = group) grouping_tbl = grouping_tbl %>% tibble::as_tibble() } else { grouping_tbl = metadata_combined[,c(group_id)] %>% tibble::as_tibble() %>% dplyr::distinct() colnames(grouping_tbl) = c("group") grouping_tbl = grouping_tbl %>% mutate(sample = group) %>% select(sample, group) } prioritization_tables = suppressMessages(generate_prioritization_tables_sampleAgnostic_multifactorial( sender_receiver_info = abundance_expression_info$sender_receiver_info, sender_receiver_de = sender_receiver_de, ligand_activities_targets_DEgenes = ligand_activities_targets_DEgenes, contrast_tbl = contrast_tbl, sender_receiver_tbl = sender_receiver_tbl, grouping_tbl = grouping_tbl, scenario = "regular", fraction_cutoff = fraction_cutoff, abundance_data_receiver = abundance_expression_info$abundance_data_receiver, abundance_data_sender = abundance_expression_info$abundance_data_sender, ligand_activity_down = FALSE )) # this specific function to not include p-values for the DE assessment, since these are meaningless for this # customized multifactorial analysis
Compile the MultiNicheNet output object
multinichenet_output = list( celltype_info = abundance_expression_info$celltype_info, celltype_de = celltype_de_genesets, sender_receiver_info = abundance_expression_info$sender_receiver_info, sender_receiver_de = sender_receiver_de, ligand_activities_targets_DEgenes = ligand_activities_targets_DEgenes, prioritization_tables = prioritization_tables, grouping_tbl = grouping_tbl, lr_target_prior_cor = tibble() ) multinichenet_output = make_lite_output(multinichenet_output)
Summarizing ChordDiagram circos plots
We will look here at the top 50 predictions across all contrasts, senders, and receivers of interest.
prioritized_tbl_oi_all = get_top_n_lr_pairs( multinichenet_output$prioritization_tables, top_n = 50, rank_per_group = FALSE )
prioritized_tbl_oi = multinichenet_output$prioritization_tables$group_prioritization_tbl %>% filter(id %in% prioritized_tbl_oi_all$id) %>% distinct(id, sender, receiver, ligand, receptor, group) %>% left_join(prioritized_tbl_oi_all) prioritized_tbl_oi$prioritization_score[is.na(prioritized_tbl_oi$prioritization_score)] = 0 senders_receivers = union(prioritized_tbl_oi$sender %>% unique(), prioritized_tbl_oi$receiver %>% unique()) %>% sort() colors_sender = RColorBrewer::brewer.pal(n = length(senders_receivers), name = 'Spectral') %>% magrittr::set_names(senders_receivers) colors_receiver = RColorBrewer::brewer.pal(n = length(senders_receivers), name = 'Spectral') %>% magrittr::set_names(senders_receivers) circos_list = make_circos_group_comparison(prioritized_tbl_oi, colors_sender, colors_receiver)
In general, we see most specific interactions are the ones that increase in the E group (= patients with T cell clonotype expansion after aPD1 therapy)
Interpretable bubble plots
We will now check the top interactions specific for the OnE group - so these are the interactions that increase during anti-PD1 therapy and more strongly in the E than NE group.
group_oi = "G1.T2"
prioritized_tbl_oi_50 = get_top_n_lr_pairs( multinichenet_output$prioritization_tables, top_n = 50, rank_per_group = FALSE) %>% filter(group == group_oi)
prioritized_tbl_oi_50_omnipath = prioritized_tbl_oi_50 %>% inner_join(lr_network_all)
Now we add this to the bubble plot visualization:
plot_oi = make_sample_lr_prod_activity_plots_Omnipath( multinichenet_output$prioritization_tables, prioritized_tbl_oi_50_omnipath) plot_oi
Further note: Typically, there are way more than 50 differentially expressed and active ligand-receptor pairs per group across all sender-receiver combinations. Therefore it might be useful to zoom in on specific cell types as senders/receivers:
Eg macrophages as sender:
prioritized_tbl_oi_M_50 = get_top_n_lr_pairs( multinichenet_output$prioritization_tables, 50, groups_oi = group_oi, senders_oi = "macrophages")
plot_oi = make_sample_lr_prod_activity_plots_Omnipath( multinichenet_output$prioritization_tables, prioritized_tbl_oi_M_50 %>% inner_join(lr_network_all)) plot_oi
Now let's check the top interactions specific for the PreE group - so these are the interactions that decrease during anti-PD1 therapy, and more strongly in group1 than group2.
group_oi = "G1.T1"
prioritized_tbl_oi_50 = get_top_n_lr_pairs( multinichenet_output$prioritization_tables, top_n = 50, rank_per_group = FALSE) %>% filter(group == group_oi)
prioritized_tbl_oi_50_omnipath = prioritized_tbl_oi_50 %>% inner_join(lr_network_all)
plot_oi = make_sample_lr_prod_activity_plots_Omnipath( multinichenet_output$prioritization_tables, prioritized_tbl_oi_50_omnipath) plot_oi
We recommend generating these plots as well for the other conditions, but we will not do this here to reduce the length of the vignette.
One of the unique benefits of MultiNicheNet is the prediction of ligand activities and target genes downstream of ligand-receptor pairs. This may offer additional unique insights into the data at hand. We will showcase this here by inferring an intercellular regulatory network. Interestingly, some target genes can be ligands or receptors themselves. This illustrates that cells can send signals to other cells, who as a response to these signals produce signals themselves to feedback to the original sender cells, or who will effect other cell types. With MultiNicheNet, we can generate this 'systems' view of these intercellular feedback and cascade processes than can be occuring between the different cell populations involved. In this intercellular regulatory network, we will draw links between ligands of sender cell types their ligand/receptor-annotated target genes in receiver cell types. So links are ligand-target links (= gene regulatory links) and not ligand-receptor protein-protein interactions!
Important In the default MultiNicheNet workflow for datasets with multiple samples per condition, we further filter out target genes based on expression correlation before generating this plot. However, this would not be meaningful here since we don't have multiple samples. As a consequence, we will have many ligand-target links here in these plots, making the plots less readable if we would consider more than the top 50 or 100 interactions.
prioritized_tbl_oi = get_top_n_lr_pairs( multinichenet_output$prioritization_tables, 100, rank_per_group = FALSE) lr_target_prior = prioritized_tbl_oi %>% inner_join( multinichenet_output$ligand_activities_targets_DEgenes$ligand_activities %>% distinct(ligand, target, direction_regulation, contrast) %>% inner_join(contrast_tbl) %>% ungroup() ) lr_target_df = lr_target_prior %>% distinct(group, sender, receiver, ligand, receptor, id, target, direction_regulation)
network = infer_intercellular_regulatory_network(lr_target_df, prioritized_tbl_oi) network$links %>% head() network$nodes %>% head()
colors_sender["Fibroblast"] = "pink" # the original yellow background with white font is not very readable network_graph = visualize_network(network, colors_sender) network_graph$plot
Interestingly, we are only left with interactions that increase after therapy in the E group. This means that we don't find many intercellular regulatory connections for the other groups. In most cases, you will get this type of network for all conditions in your data.
Note: we can also use this network to further prioritize differential CCC interactions. Here we will assume that the most important LR interactions are the ones that are involved in this intercellular regulatory network. We can get these interactions as follows:
network$prioritized_lr_interactions
prioritized_tbl_oi_network = prioritized_tbl_oi %>% inner_join( network$prioritized_lr_interactions) prioritized_tbl_oi_network
Visualize now the expression and activity of these interactions for the OnE group
group_oi = "G1.T2"
prioritized_tbl_oi_M = prioritized_tbl_oi_network %>% filter(group == group_oi) plot_oi = make_sample_lr_prod_activity_plots_Omnipath( multinichenet_output$prioritization_tables, prioritized_tbl_oi_M %>% inner_join(lr_network_all) ) plot_oi
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.