getGenesets | R Documentation |
Functionality for retrieving gene sets for an organism under investigation from databases such as GO and KEGG. Parsing and writing a list of gene sets from/to a flat text file in GMT format is also supported.
The GMT (Gene Matrix Transposed) file format is a tab delimited file format that describes gene sets. In the GMT format, each row represents a gene set. Each gene set is described by a name, a description, and the genes in the gene set. See references.
getGenesets(
org,
db = c("go", "kegg", "msigdb", "enrichr"),
gene.id.type = "ENTREZID",
cache = TRUE,
return.type = c("list", "GeneSetCollection"),
...
)
showAvailableSpecies(db = c("go", "kegg", "msigdb", "enrichr"), cache = TRUE)
showAvailableCollections(
org,
db = c("go", "kegg", "msigdb", "enrichr"),
cache = TRUE
)
writeGMT(gs, gmt.file)
org |
An organism in (KEGG) three letter code, e.g. ‘hsa’ for ‘Homo sapiens’. Alternatively, this can also be a text file storing gene sets in GMT format. See details. |
db |
Database from which gene sets should be retrieved. Currently, either 'go' (default), 'kegg', 'msigdb', or 'enrichr'. |
gene.id.type |
Character. Gene ID type of the returned gene sets.
Defaults to |
cache |
Logical. Should a locally cached version used if available?
Defaults to |
return.type |
Character. Determines whether gene sets are returned
as a simple list of gene sets (each being a character vector of gene IDs), or
as an object of class |
... |
Additional arguments for individual gene set databases.
For
For
For
|
gs |
A list of gene sets (character vectors of gene IDs). |
gmt.file |
Gene set file in GMT format. See details. |
For getGenesets
: a list of gene sets (vectors of gene IDs).
For writeGMT
: none, writes to file.
For showAvailableSpecies
and showAvailableCollections
:
a DataFrame
, displaying supported species and
available gene set collections for a gene set database of choice.
Ludwig Geistlinger
GO evidence codes: http://geneontology.org/docs/guide-go-evidence-codes/
KEGG Organism code: http://www.genome.jp/kegg/catalog/org_list.html
MSigDB: http://software.broadinstitute.org/gsea/msigdb/collections.jsp
Enrichr: https://maayanlab.cloud/Enrichr/#stats
GMT file format: http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats
the GO.db
package for GO2gene mapping used in
'GO.db' mode, and the biomaRt package for general queries to BioMart.
keggList
and keggLink
for accessing the KEGG REST
server.
msigdbr::msigdbr
for obtaining gene sets from the MSigDB.
# (1) Typical usage for gene set enrichment analysis with GO:
# Biological process terms based on BioC annotation (for human)
go.gs <- getGenesets(org = "hsa", db = "go")
# eq.:
# go.gs <- getGenesets(org = "hsa", db = "go", onto = "BP", mode = "GO.db")
# Alternatively:
# downloading from BioMart
# this may take a few minutes ...
go.gs <- getGenesets(org = "hsa", db = "go", mode = "biomart")
# list supported species for obtaining gene sets from GO
showAvailableSpecies(db = "go")
# (2) Defining gene sets according to KEGG
kegg.gs <- getGenesets(org = "hsa", db = "kegg")
# list supported species for obtaining gene sets from KEGG
showAvailableSpecies(db = "kegg")
# (3) Obtaining *H*allmark gene sets from MSigDB
hall.gs <- getGenesets(org = "hsa", db = "msigdb", cat = "H")
# list supported species for obtaining gene sets from MSigDB
showAvailableSpecies(db = "msigdb")
# list available gene set collections in the MSigDB
showAvailableCollections(db = "msigdb")
# (4) Obtaining gene sets from Enrichr
tfppi.gs <- getGenesets(org = "hsa", db = "enrichr",
lib = "Transcription_Factor_PPIs")
# list supported species for obtaining gene sets from Enrichr
showAvailableSpecies(db = "enrichr")
# list available Enrichr gene set libraries
showAvailableCollections(org = "hsa", db = "enrichr")
# (6) parsing gene sets from GMT
gmt.file <- system.file("extdata/hsa_kegg_gs.gmt",
package = "EnrichmentBrowser")
gs <- getGenesets(gmt.file)
# (7) writing gene sets to file
writeGMT(gs, gmt.file)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.