obtainEdgeList: Obtain edgelist from graphite databases. To be used within...

View source: R/obtainEdgeList.R

obtainEdgeListR Documentation

Obtain edgelist from graphite databases. To be used within prepareAdjMat

Description

Find all edges between genes in the specified graphite databases.

Usage

obtainEdgeList(genes, databases)

Arguments

genes

Character vector of gene ID and gene value. The ID and gene value should be separated by a colon. E.g. "ENTREZID:127550". It is very important to have these separated by a colon since obtainEdgeList uses regular expressions to split this into gene value and gene ID.

databases

Character vector of graphite databases you wish to search for edges. Options are: biocarta, kegg, nci, panther, pathbank, pharmgkb, reactome, smpdb, ndex. Note NDEx is recommended for expert users and is only available for the development version of netgsa (https://github.com/mikehellstern/netgsa), see details.

Details

obtainEdgeList searches through the specified databases to find edges between genes in the genes argument. Since one can search in multiple databases with different identifiers, genes are converted using AnnotationDbi::select and metabolites are converted using graphite:::metabolites(). Databases are also used to specify non-edges. This function searches through graphite databases and also has the option to search NDEx (public databases only). However, since NDEx is open-source and does not contain curated edge information like graphite, NDEx database search is a beta function and is only recommended for expert users. When searching through NDEx, gene identifiers are not converted. Only, the gene identifiers passed to the genes argument are used to search through NDEx. NDEx contains some very large networks with millions of edges and extracting those of interest can be slow.

This function is particularly useful if the user wants to create an edgelist outside of prepareAdjMat. graphite and it's databases are constantly updated. Creating and storing an edgelist outside of prepareAdjMat may help reproducibility as this guarantees the same external information is used. It can also speed up computation since if only a character vector of databases is passed to prepareAdjMat, it calls obtainEdgeList each time and each call can take several minutes. The edges from obtainEdgeList are used to create the 0-1 adjacency matrices used in netEst.undir and netEst.dir.

Using obtainEdgeList to generate edge information is highly recommended as this performs all the searching and conversion of genes to common identifiers. Inclusion of additional edges, removal of edges, or other user modifications to edgelists should be through the file_e and file_ne arguments in prepareAdjMat.

Value

A list of class obtainedEdgeList with components

edgelist

A data.table listing the edges. One row per edge. Edges are assumed to be directed. So if an edge is undirected there will be two rows.

genes_not_in_dbs

A vector of genes specified, but were not found in the databases searched

Author(s)

Michael Hellstern

See Also

prepareAdjMat, netEst.dir, netEst.undir

Examples


genes <- paste0("ENTREZID:", c("10000", "10298", "106821730", 
                               "10718", "1398", "1399", "145957", 
                               "1839", "1950", "1956"))

out <- obtainEdgeList(genes, c("kegg", "reactome"))


netgsa documentation built on Nov. 14, 2023, 5:09 p.m.