sig_gsea: sig_gsea - Perform Gene Set Enrichment Analysis

View source: R/sig_gsea.R

sig_gseaR Documentation

sig_gsea - Perform Gene Set Enrichment Analysis

Description

The sig_gsea function conducts Gene Set Enrichment Analysis (GSEA) to identify significant gene sets based on differential gene expression data. It allows for customization of various parameters to tailor the analysis to specific needs. This function performs GSEA using the fgsea package and provides visualizations and results in the form of tables and plots. It also supports the utilization of user-defined gene sets or the use of predefined gene sets from the Molecular Signatures Database (MSigDB). The function further allows for customization of parameters such as organism, gene symbol type, visualization color palette, and significance thresholds. The results can be saved in Excel format, and plots can be saved in various image formats.

Usage

sig_gsea(
  deg,
  genesets = NULL,
  path = NULL,
  gene_symbol = "symbol",
  logfc = "log2FoldChange",
  org = "hsa",
  msigdb = TRUE,
  category = "H",
  subcategory = NULL,
  palette_bar = "jama",
  palette_gsea = 2,
  show_bar = 10,
  show_col = FALSE,
  show_plot = FALSE,
  show_gsea = 8,
  show_path_n = 20,
  plot_single_sig = FALSE,
  project = "custom_sig",
  minGSSize = 10,
  maxGSSize = 500,
  verbose = TRUE,
  seed = FALSE,
  fig.type = "pdf",
  print_bar = TRUE
)

Arguments

deg

Differential expressed genes object, typically a data frame that contains gene symbols, log fold changes, and other relevant information.

genesets

This parameter allows you to specify a custom set of gene sets to be used in the enrichment analysis. If not provided, the function will use the gene sets available in the "msigdb" database based on the selected organism.

path

The path parameter represents the location where the enrichment analysis results will be stored. If not specified, a default path named "1-GSEA-result" will be created in the current working directory.

gene_symbol

his parameter specifies the column name in the deg data frame that contains the gene symbols. The default value is "symbol".

logfc

Specifies the column name in the deg data frame that contains the log fold change values. The default value is "log2FoldChange".

org

This parameter is used to select the organism for which the enrichment analysis will be performed. The options are "hsa" for Homo sapiens and "mus" for Mus musculus.

msigdb

A logical parameter indicating whether to use the gene sets from the "msigdb" database. If set to TRUE, the function will retrieve gene sets from "msigdb" based on the selected organism and category.

category

Specifies the category of gene sets to be used from the "msigdb" database. The default category is "H", representing Hallmark gene sets.

subcategory

Allows you to specify a subcategory of gene sets from the "msigdb" database. If not provided, all gene sets within the selected category will be used.

palette_bar

Specifies the color palette for the barplot used to visualize the enriched gene sets. The default value is "nrc".

palette_gsea

Specifies the color palette for the GSEA plots. The default value is 2.

show_bar

Specifies the number of enriched gene sets to show in the barplot. The default value is 10.

show_col

A logical parameter indicating whether to show the color names in the barplot. The default value is FALSE.

show_plot

A logical parameter indicating whether to display the GSEA plots. The default value is FALSE.

show_gsea

Specifies the number of most significant gene sets to show in the GSEA plots. The default value is 8.

show_path_n

Specifies the number of paths to show in the GSEA plots. The default value is 20.

plot_single_sig

A logical parameter indicating whether to plot each significant gene set separately. The default value is TRUE.

project

Specifies the name of the project or category for the analysis. If not provided, it will be set as "custom_sig".

minGSSize

Specifies the minimum gene set size to consider for enrichment analysis. Gene sets below this size will be excluded. The default value is 10.

maxGSSize

Specifies the maximum gene set size to consider for enrichment analysis. Gene sets above this size will be excluded. The default value is 500.

verbose

A logical parameter indicating whether to display additional information and messages during the analysis. The default value is TRUE.

seed

A logical parameter indicating whether to use a random seed for reproducibility in the analysis. The default value is FALSE.

fig.type

Specifies the file type for saving the GSEA plots. The default value is "pdf".

print_bar

Default is TRUE

Author(s)

Dongqiang Zeng

Examples

data("eset_stad", package = "IOBR")
data("stad_group", package = "IOBR")
library(DESeq2)
deg<- iobr_deg(eset  = eset_stad, pdata = stad_group, group_id = "subtype", pdata_id = "ID", array = FALSE, method = "DESeq2", contrast = c("EBV","GS"), path = "STAD")
res <- sig_gsea(deg = deg, genesets = signature_tme)

IOBR/IOBR documentation built on May 5, 2024, 2:34 p.m.