Table of Contents

  1. Conduct GSEA using the GO or Reactome database
  2. Perform GSEA using customized genesets
  3. Find pathways in the GO/Reactome database or customized genesets
  4. Convert GO/Reactome pathway IDs to pathway names
  5. Filter the GO/Reactome pathway list based on certain criteria
  6. Create a GSEA plot emulating the Broad Institute analysis

Conduct GSEA using the GO or Reactome database {#conduct-gsea-using-the-go-or-reactome-database}

The SeuratExtend package integrates both the GO and Reactome databases, streamlining the GSEA analysis process. This is primarily facilitated through the GeneSetAnalysisGO and GeneSetAnalysisReactome functions, among other supplementary functions. In this section, we'll delve into the usage and features of these functions.

Gene Ontology (GO) Database

Performing GSEA using the GO database can be resource-intensive due to its extensive size. To make the analysis more feasible, you might consider evaluating pathways under specific categories. For instance, in the example below, only the pathways under the "immune_system_process" category are evaluated. The results from this analysis are saved in the location: seu@misc$AUCell$GO[[title]].

library(SeuratExtend)
library(dplyr)
options(max.print = 12, spe = "human")

pbmc <- GeneSetAnalysisGO(pbmc, parent = "immune_system_process", nCores = 4) # calculating with 4 cores
matr <- pbmc@misc$AUCell$GO$immune_system_process
matr <- RenameGO(matr)
head(matr, 2:3)

For the "parent" argument, you can input any term from the GO database, be it a GO ID or a pathway name. To get a glimpse of commonly used GO categories, you can run GeneSetAnalysisGO() without any arguments:

GeneSetAnalysisGO()

Here are some suggested visualization methods:

Heatmap(CalcStats(matr, f = pbmc$cluster, order = "p", n = 4), lab_fill = "zscore")
VlnPlot2(matr[1:3,], f = pbmc$cluster, ncol = 1)
WaterfallPlot(matr, f = pbmc$cluster, ident.1 = "B cell", ident.2 = "CD8 T cell", top.n = 20)

Reactome Database

For GSEA using the Reactome database, consider assessing pathways under certain categories to make the process more manageable. The example below evaluates pathways under the "Immune System" category. Results from this analysis are saved under: seu@misc$AUCell$Reactome[[title]].

pbmc <- GeneSetAnalysisReactome(pbmc, parent = "Immune System")
matr <- pbmc@misc$AUCell$Reactome$`Immune System`
matr <- RenameReactome(matr)
Heatmap(CalcStats(matr, f = pbmc$cluster, order = "p", n = 4), lab_fill = "zscore")

Similar to the GO database, running GeneSetAnalysisReactome() without any arguments lets you view commonly used categories in the Reactome database:

GeneSetAnalysisReactome()

Perform GSEA using customized genesets {#perform-gsea-using-customized-genesets}

To conduct a Gene Set Enrichment Analysis (GSEA) with custom gene sets, the GeneSetAnalysis function is the tool of choice. For instance, one might consider utilizing the Hallmark 50 gene set, commonly employed for general screening. This set can be accessed via the hall50 object. Upon execution, the resulting AUCell matrix will be stored under the path: seu@misc$AUCell[[title]].

pbmc <- GeneSetAnalysis(pbmc, genesets = hall50$human)
matr <- pbmc@misc$AUCell$genesets
Heatmap(CalcStats(matr, f = pbmc$cluster), lab_fill = "zscore")

For those seeking a plethora of other gene sets, the SeuratExtendData::Genesets_data offers an expansive collection sourced from the GSEA MSigDB website. Here's how you can view the available collections:

names(SeuratExtendData::Genesets_data$human$GSEA)

Furthermore, for cluster annotations, the SeuratExtend::PanglaoDB_data contains a valuable resource: marker lists for 178 distinct cell types, curated from PanglaoDB. To explore these marker lists:

names(SeuratExtend::PanglaoDB_data$marker_list_human)

Find pathways in the GO/Reactome database or customized genesets {#find-pathways-in-the-goreactome-database-or-customized-genesets}

Navigating the plethora of pathways in databases like GO and Reactome can be overwhelming. The SearchDatabase function simplifies this process by offering a wide array of customizable search parameters.

General Search

The 'item' parameter is highly versatile, allowing you to search by gene name, pathway ID, or even keywords within pathway names. The following example demonstrates how to find pathways containing the gene "CD3D" or pathways with names including "metabolic."

result <- SearchDatabase(c("CD3D", "metabolic"))
names(result)
glimpse(head(result, 3))

Type-Specific Search

If you wish to limit your search to specific types of items such as gene names, you can utilize the 'type' parameter as shown below.

result <- SearchDatabase("CD3D", type = "gene")
names(result)

Database-Specific Search

To focus your search within a particular database, specify the database name using the 'database' parameter.

result <- SearchDatabase("CD3D", database = "Reactome")
names(result)

Species-Specific Search

You can specify either 'human' or 'mouse' using the 'spe' parameter.

result <- SearchDatabase("Cd3d", spe = "mouse")
glimpse(head(result, 3))

Customizing Return Types

The function also offers flexibility in output types. For example, if you require a list of pathway IDs for downstream analysis, you can use the 'return' parameter as follows.

result <- SearchDatabase("CD3D", return = "ID")
result

Alternatively, if you need the output as a gene list formatted for GeneSetAnalysis, adjust the 'return' parameter like so:

result <- SearchDatabase("CD3D", return = "genelist")
glimpse(head(result, 5))

To export the result as a data frame, suitable for formats like Excel or CSV, set the 'export.to.data.frame' parameter to TRUE.

result <- SearchDatabase("CD3D", export.to.data.frame = TRUE)
glimpse(result)

Filtering a Customized Gene Set

Lastly, you can also filter a given gene set list with the SearchPathways function. For instance, within the "Hallmark 50" database, you can find pathways that include the gene "CD3D" or have names that contain "interferon."

SearchPathways(genesets = hall50$human, item = c("CD3D", "interferon"))

Convert GO/Reactome pathway IDs to pathway names {#convert-goreactome-pathway-ids-to-pathway-names}

During the course of analyses, researchers often encounter pathway IDs from databases such as GO and Reactome. While these IDs are great for computational tasks, they can be cryptic when it comes to interpretability. RenameGO and RenameReactome functions provide a convenient means to transform these IDs into their more descriptive pathway names.

The primary parameter these functions require is "item". This can either be:

  1. A character vector of GO or Reactome IDs. For example, this could be the output from functions like FilterGOTerms or FilterReactomeTerms.
  2. A matrix where the IDs are stored in the rownames, such as the output of GeneSetAnalysisGO or GeneSetAnalysisReactome.

Convert GO IDs to their respective pathway names for human:

RenameGO(c("GO:0002376","GO:0050896"), spe = "human")

Similarly, for Reactome IDs:

RenameReactome(c("R-HSA-109582","R-HSA-112316"), spe = "human")

Filter the GO/Reactome pathway list based on certain criteria {#filter-the-goreactome-pathway-list-based-on-certain-criteria}

Both GO and Reactome databases contain thousands of pathways, but not all of which may be relevant to your study. To streamline the analysis, you can use the FilterGOTerms and FilterReactomeTerms functions to subset and refine the list of GO or Reactome pathways based on specific criteria.

Filtering GO Pathways

Let's start by looking at how you can filter GO pathways:

terms <- FilterGOTerms(parent = "GO:0002376")
RenameGO(terms)
terms2 <- FilterGOTerms(term = terms, n.min = 10, n.max = 1000)
RenameGO(terms2)
terms3 <- FilterGOTerms(term = terms, only.end.terms = TRUE)
RenameGO(terms3)

Filtering Reactome Pathways

The process for Reactome pathways is analogous. For instance, to select pathways related to the Immune System:

terms <- FilterReactomeTerms(parent = "R-HSA-168256")
RenameReactome(terms)

Create a GSEA plot emulating the Broad Institute analysis {#create-a-gsea-plot-emulating-the-broad-institute-analysis}

The GSEAplot function is designed to generate plots that emulate the Gene Set Enrichment Analysis (GSEA) as developed by the Broad Institute. This function provides a way to visualize the enrichment of specific gene sets within different biological states or conditions.

Here's how you can create a GSEA plot for the "INTERFERON_GAMMA_RESPONSE" gene set within the "Naive CD4 T" cell population of the pbmc dataset:

GSEAplot(
  pbmc, 
  ident.1 = "CD4 T Naive", 
  title = "INTERFERON_GAMMA_RESPONSE",
  geneset = hall50$human$HALLMARK_INTERFERON_GAMMA_RESPONSE
)
sessionInfo()


huayc09/SeuratExtend documentation built on July 15, 2024, 6:22 p.m.