The SeuratExtend
package integrates both the GO and Reactome databases, streamlining the GSEA analysis process. This is primarily facilitated through the GeneSetAnalysisGO
and GeneSetAnalysisReactome
functions, among other supplementary functions. In this section, we'll delve into the usage and features of these functions.
Performing GSEA using the GO database can be resource-intensive due to its extensive size. To make the analysis more feasible, you might consider evaluating pathways under specific categories. For instance, in the example below, only the pathways under the "immune_system_process" category are evaluated. The results from this analysis are saved in the location: seu@misc$AUCell$GO[[title]]
.
library(SeuratExtend) library(dplyr) options(max.print = 12, spe = "human") pbmc <- GeneSetAnalysisGO(pbmc, parent = "immune_system_process", nCores = 4) # calculating with 4 cores matr <- pbmc@misc$AUCell$GO$immune_system_process matr <- RenameGO(matr) head(matr, 2:3)
For the "parent" argument, you can input any term from the GO database, be it a GO ID or a pathway name. To get a glimpse of commonly used GO categories, you can run GeneSetAnalysisGO()
without any arguments:
GeneSetAnalysisGO()
Here are some suggested visualization methods:
Heatmap(CalcStats(matr, f = pbmc$cluster, order = "p", n = 4), lab_fill = "zscore")
VlnPlot2(matr[1:3,], f = pbmc$cluster, ncol = 1)
WaterfallPlot(matr, f = pbmc$cluster, ident.1 = "B cell", ident.2 = "CD8 T cell", top.n = 20)
For GSEA using the Reactome database, consider assessing pathways under certain categories to make the process more manageable. The example below evaluates pathways under the "Immune System" category. Results from this analysis are saved under: seu@misc$AUCell$Reactome[[title]]
.
pbmc <- GeneSetAnalysisReactome(pbmc, parent = "Immune System") matr <- pbmc@misc$AUCell$Reactome$`Immune System` matr <- RenameReactome(matr) Heatmap(CalcStats(matr, f = pbmc$cluster, order = "p", n = 4), lab_fill = "zscore")
Similar to the GO database, running GeneSetAnalysisReactome()
without any arguments lets you view commonly used categories in the Reactome database:
GeneSetAnalysisReactome()
To conduct a Gene Set Enrichment Analysis (GSEA) with custom gene sets, the GeneSetAnalysis
function is the tool of choice. For instance, one might consider utilizing the Hallmark 50 gene set, commonly employed for general screening. This set can be accessed via the hall50
object. Upon execution, the resulting AUCell matrix will be stored under the path: seu@misc$AUCell[[title]]
.
pbmc <- GeneSetAnalysis(pbmc, genesets = hall50$human) matr <- pbmc@misc$AUCell$genesets Heatmap(CalcStats(matr, f = pbmc$cluster), lab_fill = "zscore")
For those seeking a plethora of other gene sets, the SeuratExtendData::Genesets_data
offers an expansive collection sourced from the GSEA MSigDB website. Here's how you can view the available collections:
names(SeuratExtendData::Genesets_data$human$GSEA)
Furthermore, for cluster annotations, the SeuratExtend::PanglaoDB_data
contains a valuable resource: marker lists for 178 distinct cell types, curated from PanglaoDB. To explore these marker lists:
names(SeuratExtend::PanglaoDB_data$marker_list_human)
Navigating the plethora of pathways in databases like GO and Reactome can be overwhelming. The SearchDatabase
function simplifies this process by offering a wide array of customizable search parameters.
The 'item' parameter is highly versatile, allowing you to search by gene name, pathway ID, or even keywords within pathway names. The following example demonstrates how to find pathways containing the gene "CD3D" or pathways with names including "metabolic."
result <- SearchDatabase(c("CD3D", "metabolic")) names(result) glimpse(head(result, 3))
If you wish to limit your search to specific types of items such as gene names, you can utilize the 'type' parameter as shown below.
result <- SearchDatabase("CD3D", type = "gene") names(result)
To focus your search within a particular database, specify the database name using the 'database' parameter.
result <- SearchDatabase("CD3D", database = "Reactome") names(result)
You can specify either 'human' or 'mouse' using the 'spe' parameter.
result <- SearchDatabase("Cd3d", spe = "mouse") glimpse(head(result, 3))
The function also offers flexibility in output types. For example, if you require a list of pathway IDs for downstream analysis, you can use the 'return' parameter as follows.
result <- SearchDatabase("CD3D", return = "ID") result
Alternatively, if you need the output as a gene list formatted for GeneSetAnalysis
, adjust the 'return' parameter like so:
result <- SearchDatabase("CD3D", return = "genelist") glimpse(head(result, 5))
To export the result as a data frame, suitable for formats like Excel or CSV, set the 'export.to.data.frame' parameter to TRUE.
result <- SearchDatabase("CD3D", export.to.data.frame = TRUE) glimpse(result)
Lastly, you can also filter a given gene set list with the SearchPathways
function. For instance, within the "Hallmark 50" database, you can find pathways that include the gene "CD3D" or have names that contain "interferon."
SearchPathways(genesets = hall50$human, item = c("CD3D", "interferon"))
During the course of analyses, researchers often encounter pathway IDs from databases such as GO and Reactome. While these IDs are great for computational tasks, they can be cryptic when it comes to interpretability. RenameGO
and RenameReactome
functions provide a convenient means to transform these IDs into their more descriptive pathway names.
The primary parameter these functions require is "item". This can either be:
FilterGOTerms
or FilterReactomeTerms
.GeneSetAnalysisGO
or GeneSetAnalysisReactome
.Convert GO IDs to their respective pathway names for human:
RenameGO(c("GO:0002376","GO:0050896"), spe = "human")
Similarly, for Reactome IDs:
RenameReactome(c("R-HSA-109582","R-HSA-112316"), spe = "human")
Both GO and Reactome databases contain thousands of pathways, but not all of which may be relevant to your study. To streamline the analysis, you can use the FilterGOTerms
and FilterReactomeTerms
functions to subset and refine the list of GO or Reactome pathways based on specific criteria.
Let's start by looking at how you can filter GO pathways:
parent
parameter. For example, to get pathways related to the immune system process:terms <- FilterGOTerms(parent = "GO:0002376") RenameGO(terms)
n.min
and n.max
parameters. Building upon the pathways we selected under the "immune system process" (terms
), to keep only those pathways that contain between 10 and 1000 genes:terms2 <- FilterGOTerms(term = terms, n.min = 10, n.max = 1000) RenameGO(terms2)
only.end.terms
parameter to TRUE
.terms3 <- FilterGOTerms(term = terms, only.end.terms = TRUE) RenameGO(terms3)
The process for Reactome pathways is analogous. For instance, to select pathways related to the Immune System:
terms <- FilterReactomeTerms(parent = "R-HSA-168256") RenameReactome(terms)
The GSEAplot
function is designed to generate plots that emulate the Gene Set Enrichment Analysis (GSEA) as developed by the Broad Institute. This function provides a way to visualize the enrichment of specific gene sets within different biological states or conditions.
Here's how you can create a GSEA plot for the "INTERFERON_GAMMA_RESPONSE" gene set within the "Naive CD4 T" cell population of the pbmc
dataset:
GSEAplot( pbmc, ident.1 = "CD4 T Naive", title = "INTERFERON_GAMMA_RESPONSE", geneset = hall50$human$HALLMARK_INTERFERON_GAMMA_RESPONSE )
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.