In jinwooee415/queryPathcards: Query Pathcards Package

Introduction

In this tutorial we will query the website PathCards with a ligand-receptor list to find pathways shared by each pair, score each pathway, and create a figure for the scored pathways.

For this example, we will be using the pbmc seurat object provided by Seurat's official website. The download link is (https://satijalab.org/seurat/vignettes.html).

Also, we will be using a sample list of receptor and ligand pairs. I have included this list in the vignette folder of my package.

We load up the required packages:

knitr::opts_chunk$set(echo = TRUE)
library(queryPathcards)
#Loading up the required packages by queryPathcards:
library(dplyr)
library(Seurat)
library(httr)
library(rvest)
library(stringr)
library(tidyr)
library(ggplot2)
library(reshape2)

Filtering The Ligand-Receptor List

We utilize the receptor.ligand.annotation.from.object() function to only select the ligand-receptor pairs that are significant from our sample ligand-receptor list.

The receptor.ligand.annotation.from.object() function has 5 parameters: The Seurat object Variables to group by List of ligands and receptors Percent expression threshold * Mean expression threshold

#Loading in the Seurat object downloaded from the website above:
sample_seurat_object <- readRDS("~/Downloads/pbmc3k_final.rds")

#Creating the sample ligand-receptor list. Be sure to name your columns as Ligand and Receptor.
sample_list <- read.table(file = "~/Downloads/receptor_ligand_sample_list.txt", sep = "\t", header = TRUE)[1:500,]

sample_list_refined <- receptor.ligand.annotation.from.object(sample_seurat_object, "seurat_clusters", sample_list, 0.1, 0.2)

Querying

For this part, I'm going to create a sample ligand-receptor list and run it through the queryPathcards() function. For each ligand-receptor pair in the sample list, this function is going to query the PathCards website, find the pathways shared by the gene pair, and put it into a list.

The output will have two different data sets. The first data set, ligand_receptor_pathways, is going to be a dataframe that shows the pathways of each gene pair. The second data set, pathway_genes, is going to be a list that shows the list of genes in the pathway for each pathway.

Since queryPathcards() function only accepts matrix of column length 2, Ligand and Receptor, in that exact order, we have to change our sample_list_refined accordingly.

#Making sample_list_refined readable by the queryPathcards() function
sample_list_refined <- sample_list_refined[,c(2,4)]
sample_list_refined <- sample_list_refined[,c(2,1)]
colnames(sample_list_refined) <- c("Ligand", "Receptor")
sample_list_refined <- unique(sample_list_refined)

sample_result <- queryPathcards::queryPathcards(sample_list_refined)
#You can access each data set by utilizing:
#View(sample_result$ligand_receptor_pathways)
#View(sample_result$pathway_genes)

Scoring

Scoring the pathways created by the queryPathcards() function requires a Seurat object. The function scoreGenes() will require 3 variables: (1)the result by queryPathcards(), (2) Seurat object, and (3) variables to group by.

Running this function will output two data sets as well. The first is the raw data of the scores without any tidying up done. The second takes the average of each score within the categories set by the user (the 3rd variable in the function) and outputs the summarized data.

sample_scored <- scoreGenes(sample_result, sample_seurat_object, c("seurat_clusters"))
#Access these data sets by utilizing:
#View(sample_scored$full_data)
#View(sample_scored$summarized_data)

Creating figures

I created a function for creating a heatmap based on the scores. As of right now, this function will only accept the summarized data. To run the function, you need to put in (1) the scored data set, (2) the x-variable, and (3) the y-variable. The "fill=" is always Score_Average.

Tidying up the data will be different for each data set. For ours, we will arrange the data so that it is in descending order of the average scores of each pathway.

score_for_heatmap <- sample_scored$summarized_data %>% 
  select(Shared_Pathway, Score_Average) %>% 
  group_by(Shared_Pathway) %>% 
  summarise_all(funs(mean)) %>% 
  ungroup() %>% 
  arrange(desc(Score_Average))

score_for_heatmap <- score_for_heatmap %>%
  mutate(id = as.numeric(rownames(score_for_heatmap))) %>%
  select(Shared_Pathway, id) %>%
  merge(sample_scored$summarized_data) %>%
  arrange(desc(id))

score_for_heatmap$Shared_Pathway <- factor(score_for_heatmap$Shared_Pathway, levels = unique(score_for_heatmap$Shared_Pathway))

#After organizing the data set, utilize this function to create the heatmap:
createHeatMap(score_for_heatmap, "Shared_Pathway", "seurat_clusters")