get_topgo: Convenience wrapper to TopGO package (Rahnenfueher et al.)
In m-jahn/R-tools: Utility and wrapper functions for bioinformatics work

View source: R/get_topgo.R

get_topgo

R Documentation

Convenience wrapper to TopGO package (Rahnenfueher et al.)

Description

This function carries out a TopGO gene ontology enrichment on a data set with custom protein/gene IDs and GO terms. The function takes as main input a data frame with three specific columns: cluster numbers, Gene IDs, and GO terms. Alternatively, these can also be supplied as three individual lists.

Usage

get_topgo(
  df = NULL,
  GeneID = NULL,
  Gene.ontology.IDs = NULL,
  cluster = NULL,
  selected.cluster,
  topNodes = 50
)

Arguments

`df`	an (optional) data.frame with the three columns named as specified below ('GeneID', 'Gene.ontology.IDs', 'cluster')
`GeneID`	(character) The column containing gene IDs, alternatively a vector
`Gene.ontology.IDs`	(character) The column containing a list of GO terms for each gene, alternatively a vector with same order and length as 'GeneID'
`cluster`	(numeric, factor, character) the column containing a grouping variable, alternatively a vector with same order and length as 'GeneID'
`selected.cluster`	(character) the name of the group that is to be comapred to the background. Must be one of 'cluster'. If not specified, the first factor level is used (alphabetical order).
`topNodes`	(numeric) the max number of GO terms (nodes) to be returned by the function.

Value

a data.frame with TOpGO gene enrichment results

Examples


# The get_topgo function will require the TopGO package 
# as an additional dependency that is not automatically 
# attached with this package.
library(topGO)

# a list of arbitrary GO terms
go_terms <- c(
  "GO:0006412", "GO:0015979", "GO:0046148", "GO:1901566", "GO:0042777", "GO:0006614",
  "GO:0016114", "GO:0006605", "GO:0090407", "GO:0031564", "GO:0032784", "GO:0052889",
  "GO:0032787", "GO:0043953", "GO:0046394", "GO:0042168", "GO:0009124", "GO:0006090",
  "GO:0016108", "GO:0016109", "GO:0016116", "GO:0016117", "GO:0065002", "GO:0006779",
  "GO:0072330", "GO:0046390", "GO:0006754", "GO:0018298", "GO:0006782", "GO:0022618",
  "GO:0042255", "GO:0046501", "GO:0070925", "GO:0071826", "GO:0006783", "GO:0009156"
)

# construct a sample data set with 26  different genes in 2 different groups
# and test which (randomly sampled) GO terms might be enriched in both groups.
# We randomly sample 1 to 3 GO terms per gene. They need to be formatted as one 
# string of GO terms separated by "; ".
df <- data.frame(
  GeneID = LETTERS,
  cluster = rep(c(1, 2), each = 13),
  Gene.ontology.IDs = sapply(1:26, 
    function(x) paste(sample(go_terms, sample(1:3, 1)), collapse = ";")
  ),
  stringsAsFactors = FALSE
)

# test if GO terms are enriched in group 1 against background
get_topgo(df, selected.cluster = 1, topNodes = 5)

m-jahn/R-tools documentation built on Feb. 5, 2023, 1:05 p.m.