RunEnrichment: Perform the enrichment analysis (over-representation) on the...
In zh542370159/SCP: Single Cell Pipeline

RunEnrichment

R Documentation

Perform the enrichment analysis (over-representation) on the genes

Description

Perform the enrichment analysis (over-representation) on the genes

Usage

RunEnrichment(
  srt = NULL,
  group_by = NULL,
  test.use = "wilcox",
  DE_threshold = "avg_log2FC > 0 & p_val_adj < 0.05",
  geneID = NULL,
  geneID_groups = NULL,
  geneID_exclude = NULL,
  IDtype = "symbol",
  result_IDtype = "symbol",
  species = "Homo_sapiens",
  db = "GO_BP",
  db_update = FALSE,
  db_version = "latest",
  db_combine = FALSE,
  convert_species = TRUE,
  Ensembl_version = 103,
  mirror = NULL,
  TERM2GENE = NULL,
  TERM2NAME = NULL,
  minGSSize = 10,
  maxGSSize = 500,
  unlimited_db = c("Chromosome", "GeneType", "TF", "Enzyme", "CSPA"),
  GO_simplify = FALSE,
  GO_simplify_cutoff = "p.adjust < 0.05",
  simplify_method = "Wang",
  simplify_similarityCutoff = 0.7,
  BPPARAM = BiocParallel::bpparam(),
  seed = 11
)

Arguments

`srt`	A Seurat object containing the results of differential expression analysis (RunDEtest). If specified, the genes and groups will be extracted from the Seurat object automatically. If not specified, the `geneID` and `geneID_groups` arguments must be provided.
`group_by`	A character vector specifying the grouping variable in the Seurat object. This argument is only used if `srt` is specified.
`test.use`	A character vector specifying the test to be used in differential expression analysis. This argument is only used if `srt` is specified.
`DE_threshold`	A character vector specifying the filter condition for differential expression analysis. This argument is only used if `srt` is specified.
`geneID`	A character vector specifying the gene IDs.
`geneID_groups`	A factor vector specifying the group labels for each gene.
`geneID_exclude`	A character vector specifying the gene IDs to be excluded from the analysis.
`IDtype`	A character vector specifying the type of gene IDs in the `srt` object or `geneID` argument. This argument is used to convert the gene IDs to a different type if `IDtype` is different from `result_IDtype`.
`result_IDtype`	A character vector specifying the desired type of gene ID to be used in the output. This argument is used to convert the gene IDs from `IDtype` to `result_IDtype`.
`species`	A character vector specifying the species for which the analysis is performed.
`db`	A character vector specifying the name of the database to be used for enrichment analysis.
`db_update`	A logical value indicating whether the gene annotation databases should be forcefully updated. If set to FALSE, the function will attempt to load the cached databases instead. Default is FALSE.
`db_version`	A character vector specifying the version of the database to be used. This argument is ignored if `db_update` is `TRUE`. Default is "latest".
`db_combine`	A logical value indicating whether to combine multiple databases into one. If TRUE, all database specified by `db` will be combined as one named "Combined".
`convert_species`	A logical value indicating whether to use a species-converted database when the annotation is missing for the specified species. The default value is TRUE.
`Ensembl_version`	Ensembl database version. If NULL, use the current release version.
`mirror`	Specify an Ensembl mirror to connect to. The valid options here are 'www', 'uswest', 'useast', 'asia'.
`TERM2GENE`	A data frame specifying the gene-term mapping for a custom database. The first column should contain the term IDs, and the second column should contain the gene IDs.
`TERM2NAME`	A data frame specifying the term-name mapping for a custom database. The first column should contain the term IDs, and the second column should contain the corresponding term names.
`minGSSize`	A numeric value specifying the minimum size of a gene set to be considered in the enrichment analysis.
`maxGSSize`	A numeric value specifying the maximum size of a gene set to be considered in the enrichment analysis.
`unlimited_db`	A character vector specifying the names of databases that do not have size restrictions.
`GO_simplify`	A logical value indicating whether to simplify the GO terms. If `TRUE`, additional results with simplified GO terms will be returned.
`GO_simplify_cutoff`	A character vector specifying the filter condition for simplification of GO terms. This argument is only used if `GO_simplify` is `TRUE`.
`simplify_method`	A character vector specifying the method to be used for simplification of GO terms. This argument is only used if `GO_simplify` is `TRUE`.
`simplify_similarityCutoff`	A numeric value specifying the similarity cutoff for simplification of GO terms. This argument is only used if `GO_simplify` is `TRUE`.
`BPPARAM`	A BiocParallelParam object specifying the parallel back-end to be used for parallel computation. Defaults to BiocParallel::bpparam().
`seed`	The random seed for reproducibility. Defaults to 11.

Value

If input is a Seurat object, returns the modified Seurat object with the enrichment result stored in the tools slot.

If input is a geneID vector with or without geneID_groups, return the enrichment result directly.

Enrichment result is a list with the following component:

enrichment: A data.frame containing all enrichment results.
results: A list of enrichResult objects from the DOSE package.
geneMap: A data.frame containing the ID mapping table for input gene IDs.
input: A data.frame containing the input gene IDs and gene ID groups.
DE_threshold: A specific threshold for differential expression analysis (only returned if input is a Seurat object).

Examples

data("pancreas_sub")
pancreas_sub <- RunDEtest(pancreas_sub, group_by = "CellType")
pancreas_sub <- RunEnrichment(
  srt = pancreas_sub, group_by = "CellType", DE_threshold = "p_val_adj < 0.05",
  db = "GO_BP", species = "Mus_musculus"
)
EnrichmentPlot(pancreas_sub, db = "GO_BP", group_by = "CellType", plot_type = "comparison")

pancreas_sub <- RunEnrichment(
  srt = pancreas_sub, group_by = "CellType", DE_threshold = "p_val_adj < 0.05",
  db = c("MSigDB", "MSigDB_MH"), species = "Mus_musculus"
)
EnrichmentPlot(pancreas_sub, db = "MSigDB", group_by = "CellType", plot_type = "comparison")
EnrichmentPlot(pancreas_sub, db = "MSigDB_MH", group_by = "CellType", plot_type = "comparison")

# Remove redundant GO terms
pancreas_sub <- RunEnrichment(srt = pancreas_sub, group_by = "CellType", db = "GO_BP", GO_simplify = TRUE, species = "Mus_musculus")
EnrichmentPlot(pancreas_sub, db = "GO_BP_sim", group_by = "CellType", plot_type = "comparison")

# Use a combined database
pancreas_sub <- RunEnrichment(
  srt = pancreas_sub, group_by = "CellType",
  db = c("KEGG", "WikiPathway", "Reactome", "PFAM", "MP"),
  db_combine = TRUE,
  species = "Mus_musculus"
)
EnrichmentPlot(pancreas_sub, db = "Combined", group_by = "CellType", plot_type = "comparison")

# Or use "geneID" and "geneID_groups" as input to run enrichment
de_df <- dplyr::filter(pancreas_sub@tools$DEtest_CellType$AllMarkers_wilcox, p_val_adj < 0.05)
enrich_out <- RunEnrichment(geneID = de_df[["gene"]], geneID_groups = de_df[["group1"]], db = "GO_BP", species = "Mus_musculus")
EnrichmentPlot(res = enrich_out, db = "GO_BP", plot_type = "comparison")

zh542370159/SCP documentation built on Nov. 22, 2023, 2:34 a.m.