Introduction

biodbKegg is a biodb extension package that implements a connector to KEGG Compound database [@kanehisa2000_KEGG].

Installation

Install using Bioconductor:

if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install('biodbKegg')

Initialization

The first step in using biodbKegg, is to create an instance of the biodb class BiodbMain from the main biodb package. This is done by calling the constructor of the class:

mybiodb <- biodb::newInst()

During this step the configuration is set up, the cache system is initialized and extension packages are loaded.

We will see at the end of this vignette that the biodb instance needs to be terminated with a call to the terminate() method.

Creating a connector to KEGG Compound database

In biodb the connection to a database is handled by a connector instance that you can get from the factory. biodbKegg implements a connector to a remote database. Here is the code to instantiate a connector:

kegg.comp.conn <- mybiodb$getFactory()$createConn('kegg.compound')

Accessing entries

To retrieve entries, use:

entries <- kegg.comp.conn$getEntry(c('C00133', 'C00751'))
entries

To convert a list of entries into a dataframe, run:

x <- mybiodb$entriesToDataframe(entries, compute=FALSE)
x

Search for compounds of a certain mass

ids <- kegg.comp.conn$searchForEntries(list(monoisotopic.mass=list(value=64, delta=2.0)), max.results=10)
entries <- mybiodb$getFactory()$getEntry('kegg.compound', ids)

Add information to a data frame containing KEGG Compound IDs

If you have a data frame containing a column with KEGG Compound IDs, you can add information such as associated KEGG Enzymes, associated KEGG Pathways and KEGG Modules to your data frame, for a specific organism.

For the example we use the list of compound IDs we already have, to construct a data frame:

kegg.comp.ids <- c('C06144', 'C06178', 'C02659')
mydf <- data.frame(kegg.ids=kegg.comp.ids)

Using the addInfo() method of KeggCompoundConn class, we add information about pathways, enzymes and modules for these compounds:

kegg.comp.conn$addInfo(mydf, id.col='kegg.ids', org='mmu')

Note that, by default, the number of values for each field is limited to 3. Please see the help page of KeggCompoundConn for more information about addInfo(), and a description of all parameters.

The list of organisms is available at https://www.genome.jp/kegg/catalog/org_list.html.

Getting pathways related to compounds using KEGG databases

In this example we will look for pathways related to specified compounds, count to how many pathways each compound is related, build a pathway graph, and create a decorated pathway graph picture.

For that we will start from a given list of KEGG compound IDs, and explore KEGG to find to which organisms they can be related and try to discover links between them through KEGG pathways.

As an example, we will start from a predefined list of KEGG compound IDs, and focus on one organism, the mouse.

Look for pathways related to specified compounds

Given a list of compounds and an organism, we can look for related pathways in a single command:

kegg.comp.ids <- c('C06144', 'C06178', 'C02659')
pathways <- kegg.comp.conn$getPathwayIds(kegg.comp.ids, 'mmu')
pathways

Count to how many pathways each compound is related

With another function we can get the pathways found for each compound:

path.per.comp <- kegg.comp.conn$getPathwayIdsPerCompound(kegg.comp.ids, 'mmu')
fct <- function(i) {
    if (i %in% names(path.per.comp)) length(path.per.comp[[i]]) else 0
}
nb_mmu_gene_pathways <- vapply(kegg.comp.ids, fct, FUN.VALUE=0)
names(nb_mmu_gene_pathways) <- kegg.comp.ids

Here, in the final table, we list the number of pathways for each KEGG compound:

nb_mmu_gene_pathways

Build a pathway graph

To build a pathway graph, we need a connector to the KEGG Pathway database:

kegg.path.conn <- mybiodb$getFactory()$getConn('kegg.pathway')

Building list of edges and vertices for pathways is done by calling buildPathwayGraph():

kegg.path.conn$buildPathwayGraph(pathways[[1]])

The object returned is a list whose names are the pathway IDs submitted, and the values are lists containing two data frames (edges and vertices).

We can also get an igraph object for the a pathway (or a list of pathways):

ig <- kegg.path.conn$getPathwayIgraph(pathways[[1]])

And we plot it:

vert <- igraph::as_data_frame(ig, 'vertices')
shapes <- vapply(vert[['type']], function(x) if (x == 'reaction') 'rectangle'
                 else 'circle', FUN.VALUE='', USE.NAMES=FALSE)
colors <- vapply(vert[['type']], function(x) if (x == 'reaction') 'yellow' else
    'red', FUN.VALUE='', USE.NAMES=FALSE)
plot(ig, vertex.shape=shapes, vertex.color=colors, vertex.label.dist=1,
     vertex.size=4, vertex.size2=4)

Create a decorated pathway graph picture

We will now use a KEGG pathway picture and highlight some of the enzymes and compounds on it.

For this, we first get the enzymes related to the compounds:

kegg.enz.ids <- mybiodb$entryIdsToSingleFieldValues(kegg.comp.ids,
                                                    db='kegg.compound',
                                                    field='kegg.enzyme.id')
kegg.enz.ids

define the colors we want to apply:

color2ids <- list(yellow=kegg.enz.ids, red=kegg.comp.ids)

Then we call the method that builds the highlighted image and print it:

kegg.path.conn$getDecoratedGraphPicture(pathways[[1]],
    color2ids=color2ids)

Terminate the biodb instance

When done with your biodb instance you have to terminate it, in order to ensure release of resources (file handles, database connection, etc):

mybiodb$terminate()

Session information

sessionInfo()

References



pkrog/biodbKegg documentation built on Oct. 1, 2022, 6:27 p.m.