kgml-utility | R Documentation |
Extract entities of different types from KGML files in order to convert the pathway to a mathematical graph that we can compute on.
collectEntries(xmldoc, anno = c("all", "one", "batch"))
collectRelations(xmldoc)
collectReactions(xmldoc)
xmldoc |
Either the name of an XML file meeting the
specifications of the KEGG Genomic Markup Language (KGML), or an
object of class |
anno |
Choose a method for analyzing KEGG compounds and glycans. See Details. |
These functions are primarily intended as utility functions that
implement processes required by the main function in the package,
KGMLtoIgraph
. They have been made accessible to
the end user for use in debugging problematic KGML files or to reuse
the KGML files in contexts other than the one we focus on in this
package.
We have implemented three different methods for annotating KEGG
compounds and glycans in their reaction entities. These are recorded
in the KGML pathway files as "C-numbers" (e.g., C12345) or "G-numbers"
(e.g., G12345). These serve as identifieers into their local
databases, and we want to convert them (usually) to IUPAC names to
display on nodes in the final graph. Method "one" makes a separate
call to keggGet
from the KEGGREST
package. Method
"batch" makes calls in batches of ten identifiers, using the fact that
keggGet
enforces that limit. Method "all" makes a single call
using keggLink
to download the entire database. Note
that all three methods cache their results in a package-local
environment to avoid repeating the same call. In a profiling test of
one moderat sized pathway, a single invocation of
collectEntities
took 54 seconds for method "one", 53 seconds
for method "batch", and 47 seconds foir method "all". If you are
procesing multiple pathways in one session, we expect that the
advantage of the "all" method would be even greater since the results
are cached.
The collectReactions
and collectRelations
functions
return a data frame with three columns (Source
, Target
,
and MIM
), where each row describes one edge of the
pathway/graph. In KEGG, they distingiuiish between relations (which
usually connect genes) and reactions (which connect chemical
compounds). The Source
and Target
columns are the
alphanumeric identifiers of items decribing nodes. The MIM
column is the edge type in KGML.
The collectEntries
function returns a data frame with three
columns (GraphId, label, and Type), where each row describes one node
or vertex of the pathway/graph. The GraphId
column is a unique
alphanumeric identifier. The label
column is a human-readable
name for the node, often the official gene symbol. When creating an
igraph
object from a pathway, the first column is used as an
identifier to define the node. Also, the plot
method for
igraph
s recognizes the term label
as a column that
defines the text that should be displayed in a node.
Kevin R. Coombes krc@silicovore.com, Polina Bombina pbombina@augusta.edu
xmlfile <- system.file("pathways/WP3850.kgml", package = "WayFindR")
xmldoc <- XML::xmlParseDoc(xmlfile)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.