knitr::opts_chunk$set( collapse = TRUE, # comment = "#>", warning = FALSE, message = FALSE )
This vignette assesses the ligands prioritized by NicheNet in their ability to predict a gene set of interest. We will first follow the steps of Perform NicheNet analysis starting from a Seurat object) to obtain ligands rankings. Make sure you understand the steps and output of a basic NicheNet analysis (more information in Perform NicheNet analysis starting from a Seurat object: step-by-step analysis. You can also apply this vignette to the NicheNet’s ligand activity analysis on a gene set of interest vignette.
library(nichenetr) # Please update to v2.0.4 library(Seurat) library(SeuratObject) library(tidyverse)
# Read in networks lr_network <- readRDS(url("https://zenodo.org/record/7074291/files/lr_network_mouse_21122021.rds")) ligand_target_matrix <- readRDS(url("https://zenodo.org/record/7074291/files/ligand_target_matrix_nsga2r_final_mouse.rds")) weighted_networks <- readRDS(url("https://zenodo.org/record/7074291/files/weighted_networks_nsga2r_final_mouse.rds")) lr_network <- lr_network %>% distinct(from, to) # Read in expression data and update seuratObj <- readRDS(url("https://zenodo.org/record/3531889/files/seuratObj.rds")) seuratObj <- UpdateSeuratObject(seuratObj) seuratObj <- alias_to_symbol_seurat(seuratObj, "mouse") # Run NicheNet nichenet_output <- nichenet_seuratobj_aggregate( seurat_obj = seuratObj, sender = c("CD4 T","Treg", "Mono", "NK", "B", "DC"), receiver = "CD8 T", condition_colname = "aggregate", condition_oi = "LCMV", condition_reference = "SS", expression_pct = 0.05, ligand_target_matrix = ligand_target_matrix, lr_network = lr_network, weighted_networks = weighted_networks ) best_upstream_ligands <- nichenet_output$ligand_activities %>% top_n(30, aupr_corrected) %>% arrange(desc(aupr_corrected)) %>% pull(test_ligand)
For the top 30 ligands, we will now build a multi-ligand model that uses all top-ranked ligands to predict whether a gene belongs to the gene set of interest (differentially expressed genes in CD8 T cells after LCMV infection) or not. This classification model will be trained via cross-validation and returns a probability for every gene.
# change rounds and folds here, to two rounds to reduce time: normally: do multiple rounds k <- 3 # 3-fold n <- 2 # 2 rounds gene_predictions_top30_list <- lapply(1:n, assess_rf_class_probabilities, folds = k, geneset = nichenet_output$geneset_oi, background_expressed_genes = nichenet_output$background_expressed_genes, ligands_oi = best_upstream_ligands, ligand_target_matrix = ligand_target_matrix)
Evaluate now how well the target gene probabilities accord to the gene set assignments.
# get performance: auroc-aupr-pearson target_prediction_performances_cv <- gene_predictions_top30_list %>% lapply(classification_evaluation_continuous_pred_wrapper) %>% bind_rows() %>% mutate(round=seq(1:nrow(.)))
What is the AUROC, AUPR and PCC of this model (averaged over cross-validation rounds)?
target_prediction_performances_cv$auroc %>% mean() target_prediction_performances_cv$aupr %>% mean() target_prediction_performances_cv$pearson %>% mean()
Evaluate now whether genes belonging to the gene set are more likely to be top-predicted. We will look at the top 5% of predicted targets here.
# get performance: how many viral response genes and non-viral response-genes among top 5% predicted targets target_prediction_performances_discrete_cv <- gene_predictions_top30_list %>% lapply(calculate_fraction_top_predicted, quantile_cutoff = 0.95) %>% bind_rows(.id = "round")
What is the fraction of viral response genes that belongs to the top 5% predicted targets?
target_prediction_performances_discrete_cv %>% filter(true_target) %>% .$fraction_positive_predicted %>% mean()
What is the fraction of non-viral-response genes that belongs to the top 5% predicted targets?
target_prediction_performances_discrete_cv %>% filter(!true_target) %>% .$fraction_positive_predicted %>% mean()
We see that the viral response genes are enriched in the top-predicted target genes. To test this, we will now apply a Fisher's exact test for every cross-validation round and report the average p-value.
target_prediction_performances_discrete_fisher <- gene_predictions_top30_list %>% lapply(calculate_fraction_top_predicted_fisher, quantile_cutoff = 0.95) target_prediction_performances_discrete_fisher %>% unlist() %>% mean()
Finally, we will look at which p-EMT genes are well-predicted in every cross-validation round.
# get top predicted genes top_predicted_genes <- lapply(1:n, get_top_predicted_genes, gene_predictions_top30_list) %>% reduce(full_join, by = c("gene","true_target")) top_predicted_genes %>% filter(true_target)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.