03-main: Converting GPML FIles to Igraph Objects

XMLtoIgraphR Documentation

Converting GPML FIles to Igraph Objects

Description

Takes an XML file, either GPML from WikiPathways, or KGML from KEGG, extracts the entities therein, and makes minor adjustments necessary to convert it into an igraph object. Along the way, it assigns a consistent set of colors, line types, and shapes.

Usage

GPMLtoIgraph(xmldoc, returnLists = FALSE, debug = FALSE)
KGMLtoIgraph(xmldoc, returnLists = FALSE, debug = FALSE)
nodeLegend(x, graph)
edgeLegend(x, graph)

Arguments

xmldoc

Either the name of an XML file meeting the specifications of the appropriate markup language (GPML or KGML), or an object of class XMLInternalDocument obtained by running such a file through the xmlParseDoc function of the XML package.

returnLists

A logical value; should the return value include the node list and edge list matrices?

debug

A logical value; should debugging progress information be printed? Probably best to leave it equal to FALSE.

x

A character string, such as "topleft" indicating where to place the legend.

graph

An igraph object as produced by the functionGPMLtoIgraph.

Details

GPMLtoIgraph and KGMLtoIgraph are the main functions of the WayFindRpackage. They achieve the primary goal of converting pathways from one of the biological graph XML file formats into a mathematical graph, in the format defined by the igraph package. At that point, we can apply a wide variety of graph algorithms from computer science in order to "compute on biological pathways".

The implementation of these functions relies on the utility functions described in gpml-utility or kgml-utility.

Briefly, the first algorithm starts by collecting all nodes (DataNodes in GPML) and edges (Interactions in GPML) from the GPML input file. However, GPML includes two other structures with (semantic) biological meaning. First, the GPML description includes the idea of an (invisible) "Anchor" that allows one edge to point to another edge. We expand those invisible target locations into full-fledged nodes in the final graph. Second, GPML includes "Groups" that represent protein complexes or sets of closely related genes. In WayFindR, we represent such groups as their own nodes in the final graph, and add "contained" edges linking in the group members. The transformations of Anchors and Groups do not change the fundamental topology (in particular, the existence of cycles) of the resulting graph.

Further, GPML includes non-semantic features (including "Labels" and "Shapes") that are (mis)used by some pathway authors as the targets of edges. WayFindR converts any targeted non-semantic features into nodes in order to preserve as much information as possible from the original pathway in WikiPathways.

The KGML algorithm is similar in structure, burt has to deal with the diferent underlying sdtructure of th KGML specification. Tghei files contina three kinds of entities: Entry, Relation, and Reaction. An Entry becoemes a vertex. It can be a gene, a map (a link to another pathway), a group (as above, except that the members of the group are stored as an entity called a Component within the group Entry), an ortholog (a KEGG-defined set of genes that are teh "same" across species), or a compound (subdivided into compounds, glycans, or drugs, all of which we view as "SmallMolecules" analogous to what GPML calls a metabolite). A Relation is an edge that usually connects genes, but we must map the terminology annotating edge types into the MIM space defining biological edges. Finally, a Reaction is an edge between compounds, which has no real analog in the WikiPathways universe. The only "type" associated with a reaction is whether it is "reversible" or "irreversible".

Value

The GMLtoIgraph function usually returns an igraph object that represents the pathway defined by the input xmlfile. If the argument returnLists = TRUE, then it returns a list containing three components; graph is the igraph object, nodes is a data frame containing node information where each row is a node, and edges is a matrix containing edge information where each row is an edge. The node and edge information can be used to reproduce the graph in any network or graph visualization tool that accepts such matrices to describe the graph. The nodes data frame includes columns for color and shape, and the edges data frame includes columns for color and lty that are recognized and used by the plot.igraph function.

Both nodeLegend and edgeLegend invisibly return the same value that is returned by the legend function that is used in the implementation.

Author(s)

Kevin R. Coombes krc@silicovore.com, Polina Bombina pbombina@augusta.edu

Examples

xmlfile <- system.file("pathways/WP3850.gpml", package = "WayFindR")
graf <- GPMLtoIgraph(xmlfile)
set.seed(13579)
L <- igraph::layout_with_graphopt
plot(graf, layout=L)
nodeLegend("topleft", graf)
edgeLegend("bottomright", graf)

WayFindR documentation built on June 30, 2024, 3 a.m.