getTERM2GENE: Obtains TERM2GENE object for corGSEA

View source: R/getTERM2GENE.R

getTERM2GENER Documentation

Obtains TERM2GENE object for corGSEA

Description

Wrapper for msgidb::msigdbr() function

Usage

getTERM2GENE(
  GSEA_Type = c("simple"),
  Species = c("hsapiens", "mmusculus"),
  sampler = FALSE,
  listReturn = FALSE
)

Arguments

GSEA_Type

Which pathway annotations should be considered? Options listed in correlationAnalyzeR::pathwayCategories – See details below for more info.

Species

Species to obtain gene names for. Either 'hsapiens' or 'mmusculus'

sampler

If TRUE, will only return 100,000 random genesets from either simple or complex TERM2GENEs. Useful for reducing GSEA computational burden.

listReturn

If TRUE, will return annotations as a list object.

Details

GSEA_Type category names and their MSIGDB description:

Hallmark (a.k.a "H" in MSIGDB): "Hallmark gene sets summarize and represent specific well-defined biological states or processes and display coherent expression. These gene sets were generated by a computational methodology based on identifying overlaps between gene sets in other MSigDB collections and retaining genes that display coordinate expression."

Cytogenic bands (a.k.a "C1" in MSIGDB): "Gene sets corresponding to each human chromosome and each cytogenetic band that has at least one gene."

Perturbations (a.k.a. "C2:CGP" in MSIGDB): "Gene sets represent expression signatures of genetic and chemical perturbations. A number of these gene sets come in pairs: xxx_UP (and xxx_DN) gene set representing genes induced (and repressed) by the perturbation."

Canonical pathways (a.k.a. "C2:CP" in MSIGDB): "Gene sets from pathway databases. Usually, these gene sets are canonical representations of a biological process compiled by domain experts."

BioCarta (a.k.a. "C2:CP:BIOCARTA" in MSIGDB): "Gene sets derived from the BioCarta pathway database."

KEGG (a.k.a. "C2:CP:KEGG" in MSIGDB): "Gene sets derived from the KEGG pathway database."

PID (a.k.a. "C2:CP:PID" in MSIGDB): "Gene sets derived from the PID pathway database."

Reactome (a.k.a. "C2:CP:REACTOME" in MSIGDB): "Gene sets derived from the Reactome pathway database."

miRNA targets (a.k.a. "C3:MIR" in MSIGDB): "Gene sets that contain genes sharing putative target sites (seed matches) of human mature miRNA in their 3'-UTRs."

TF targets (a.k.a. "C3:TFT" in MSIGDB): "Gene sets that share upstream cis-regulatory motifs which can function as potential transcription factor binding sites. Based on work by Xie et al. 2005"

Cancer gene neighborhoods (a.k.a. "C4:CGN" in MSIGDB): "Gene sets defined by expression neighborhoods centered on 380 cancer-associated genes. This collection is described in Subramanian, Tamayo et al. 2005"

Cancer modules (a.k.a. "C4:CGN" in MSIGDB): "Gene sets defined by Segal et al. 2004. Briefly, the authors compiled gene sets ('modules') from a variety of resources such as KEGG, GO, and others. By mining a large compendium of cancer-related microarray data, they identified 456 such modules as significantly changed in a variety of cancer conditions."

GO:BP (a.k.a. "C5:BP" in MSIGDB): "Gene sets derived from the GO Biological Process Ontology."

GO:CC (a.k.a. "C5:CC" in MSIGDB): "Gene sets derived from the GO Cellular Component Ontology."

GO:MF (a.k.a. "C5:MF" in MSIGDB): "Gene sets derived from the GO Molecular Function Ontology."

Oncogenic signatures (a.k.a. "C6" in MSIGDB): "Gene sets that represent signatures of cellular pathways which are often dis-regulated in cancer. The majority of signatures were generated directly from microarray data from NCBI GEO or from internal unpublished profiling experiments involving perturbation of known cancer genes."

Immunological signatures (a.k.a. "C7" in MSIGDB): "Gene sets that represent cell states and perturbations within the immune system. The signatures were generated by manual curation of published studies in human and mouse immunology."

Cell Type signatures (a.k.a. "C8" in MSIGDB): "Gene sets that contain curated cluster markers for cell types identified in single-cell sequencing studies of human tissue."

simple: This is the combination of "Hallmark", "Perturbations", "BioCarta", "GO:BP", "GO:CC", "GO:MF", "KEGG", "Canonical pathways", "PID", and "Reactome"

complex: This includes all possible gene sets.

Value

A tbl object with columns "gs_name" and "gene_symbol"

Examples

TERM2GENE <- correlationAnalyzeR::getTERM2GENE(GSEA_Type = "simple")
TERM2GENE <- correlationAnalyzeR::getTERM2GENE(GSEA_Type = c("Hallmark", "KEGG"))


Bishop-Laboratory/correlationAnalyzeR documentation built on June 28, 2022, 8:31 p.m.