gost: Gene list functional enrichment.

View source: R/gprofiler2.R

gostR Documentation

Gene list functional enrichment.


Interface to the g:Profiler tool g:GOSt (https://biit.cs.ut.ee/gprofiler/gost) for functional enrichments analysis of gene lists. In case the input 'query' is a list of gene vectors, results for multiple queries will be returned in the same data frame with column 'query' indicating the corresponding query name. If 'multi_query' is selected, the result is a data frame for comparing multiple input lists, just as in the web tool.


  organism = "hsapiens",
  ordered_query = FALSE,
  multi_query = FALSE,
  significant = TRUE,
  exclude_iea = FALSE,
  measure_underrepresentation = FALSE,
  evcodes = FALSE,
  user_threshold = 0.05,
  correction_method = c("g_SCS", "bonferroni", "fdr", "false_discovery_rate", "gSCS",
  domain_scope = c("annotated", "known", "custom", "custom_annotated"),
  custom_bg = NULL,
  numeric_ns = "",
  sources = NULL,
  as_short_link = FALSE,
  highlight = FALSE



character vector, or a (named) list of character vectors for multiple queries, that can consist of mixed types of gene IDs (proteins, transcripts, microarray IDs, etc), SNP IDs, chromosomal intervals or term IDs.


organism name. Organism names are constructed by concatenating the first letter of the name and the family name. Example: human - 'hsapiens', mouse - 'mmusculus'.


in case input gene lists are ranked this option may be used to get GSEA style p-values.


in case of multiple gene lists, returns comparison table of these lists. If enabled, the result data frame has columns named 'p_values', 'gconvert_sizes', 'intersection_sizes' with vectors showing values in the order of input queries. Set 'multi_gconvert' to FALSE and simply input query as list of multiple gene vectors to get the results in a long format.


whether all or only statistically significant results should be returned.


exclude GO electronic annotations (IEA).


measure underrepresentation.


include evidence codes to the results. Note that this can decrease performance and make the query slower. In addition, a column 'intersection' is created that contains the gene id-s that intersect between the query and term. This parameter does not work if 'multi_query' is set to TRUE.


custom p-value threshold for significance, results with smaller p-value are tagged as significant. We don't recommend to set it higher than 0.05.


the algorithm used for multiple testing correction, one of "gSCS" (synonyms: "analytical", "g_SCS"), "fdr" (synonyms: "false_discovery_rate"), "bonferroni".


how to define statistical domain, one of "annotated", "known", "custom" or "custom_annotated".


vector of gene names to use as a statistical background. If given, the domain_scope is by default set to "custom", if domain_scope is set to "custom_annotated", then this is used instead.


namespace to use for fully numeric IDs (list of available namespaces).


a vector of data sources to use. Currently, these include GO (GO:BP, GO:MF, GO:CC to select a particular GO branch), KEGG, REAC, TF, MIRNA, CORUM, HP, HPA, WP. Please see the g:GOSt web tool for the comprehensive list and details on incorporated data sources.


indicator to return results as short-link to the g:Profiler web tool. If set to TRUE, then the function returns the results URL as a character string instead of the data.frame.


indicator to return a TRUE-FALSE column called 'highlighted' to indicate driver terms in GO.


The input gene lists are not stored in g:Profiler unless the option 'as_short_link' is set to TRUE.


A named list where 'result' contains data.frame with the enrichment analysis results and 'meta' contains metadata needed for Manhattan plot. If the input consisted of several lists the corresponding list is indicated with a variable 'query'. The 'result' data.frame is ordered first by the query name, data source (such as GO:BP, GO:CC, GO:MF, REAC, etc), and then by the adjusted p-value. When requesting a 'multi_query', either TRUE or FALSE, the columns of the resulting data frame differ. If 'evcodes' is set, the return value includes columns 'evidence_codes' and 'intersection'. The latter conveys info about the intersecting genes between the corresponding query and term.

The result fields are further described in the vignette.

If 'as_short_link' is set to TRUE, then the result is a character short-link to see and share corresponding results via the g:Profiler web tool. In this case, the input gene lists will be stored in a database.


Liis Kolberg <liis.kolberg@ut.ee>, Uku Raudvere <uku.raudvere@ut.ee>


gostres <- gost(c("X:1000:1000000", "rs17396340", "GO:0005005", "ENSG00000156103", "NLRP1"))

gprofiler2 documentation built on July 9, 2023, 6:06 p.m.