data-funs: Parse, build and load the KEGG knowledge model
In FELLA: Interpretation and enrichment for metabolomics data

Description Usage Arguments Details Value References See Also Examples

Function buildGraphFromKEGGREST makes use of the KEGG REST API (requires internet connection) to build and return the curated KEGG graph.

Function buildDataFromGraph takes as input the KEGG graph generated by buildGraphFromKEGGREST and writes the KEGG knowledge model in the desired permanent directory.

Function loadKEGGdata loads the internal files containing the KEGG knowledge model into a FELLA.DATA object.

In general, generateGraphFromKEGGREST and generateDataFromGraph are one-time executions for a given organism and knowledge model, in this precise order. On the other hand, the user needs to run loadKEGGdata in every new R session to load such model into a FELLA.DATA object.

buildGraphFromKEGGREST(organism = "hsa", filter.path = NULL)

buildDataFromGraph(keggdata.graph = NULL, databaseDir = NULL,
    internalDir = TRUE, matrices = c("hypergeom", "diffusion",
    "pagerank"), normality = c("diffusion", "pagerank"),
    dampingFactor = 0.85, niter = 100)

loadKEGGdata(databaseDir = tail(listInternalDatabases(), 1),
    internalDir = TRUE, loadMatrix = NULL)

`organism`	Character, KEGG code for the organism of interest
`filter.path`	Character vector, pathways to filter. This is a pattern matched using regexp. E.g: `"01100"` to filter the overview metabolic pathway in any species
`keggdata.graph`	An igraph object generated by the function `buildGraphFromKEGGREST`
`databaseDir`	Character containing the directory to save KEGG files. It is a relative directory inside the library location if `internalDir = TRUE`. If left to `NULL`, an automatic name containing the date, organism and the KEGG release is generated.
`internalDir`	Logical, should the directory be internal in the package directory?
`matrices`	A character vector, containing any of these: `"hypergeom"`, `"diffusion"`, `"pagerank"`
`normality`	A character vector, containing any of these: `"diffusion"`, `"pagerank"`
`dampingFactor`	Numeric value between 0 and 1 (none inclusive), damping factor `d` for PageRank (`page.rank`)
`niter`	Numeric value, number of iterations to estimate the p-values for the CC size. Between 10 and 1e3.
`loadMatrix`	Character vector to choose if heavy matrices should be loaded. Can contain: `"diffusion"`, `"pagerank"`

In function buildGraphFromKEGGREST, The user specifies (i) an organism, and (ii) patterns matching pathways that should not be included as nodes. A graph object, as described in [Picart-Armada, 2017], is built from the comprehensive KEGG database [Kanehisa, 2017]. As described in the main vignette, accessible through browseVignettes("FELLA"), this graph has five levels that represent categories of KEGG nodes. From top to bottom: pathways, modules, enzymes, reactions and compounds. This knowledge representation is resemblant to the one formerly used by MetScape [Karnovsky, 2011], in which enzymes connect to genes instead of modules and pathways. The necessary KEGG annotations are retrieved through KEGGREST R package [Tenenbaum, 2013]. Connections between pathways/modules and enzymes are inferred through organism-specific genes, i.e. an edge is added if a gene connects both entries. However, in order to enrich metabolomics data, the user has to pass the graph object to buildDataFromGraph to obtain the FELLA.USER object. All the networks are handled with the igraph R package [Csardi, 2006].

Using buildDataFromGraph is the second step to use the FELLA package. The knoledge graph is used to compute other internal variables that are required to run any enrichment. The main point behind the enrichment is to provide a small part of the knowledge graph relevant to the supplied metabolites. This is accomplished through diffusion processes and random walks, followed by a statistical normalisation, as described in [Picart-Armada, 2017]. When building the internal files, the user can choose whether to store (i) matrices for each provided method, and (ii) vectors derived from such matrices to use the parametric approaches. These are optional but enable (i) faster permutations and custom metabolite backgrounds, and (ii) parametric approaches. WARNING: diffusion and PageRank matrices in (i) can allocate up to 250MB each. On the other hand, the niter parameter controls the amount of trials to approximate the distribution of the connected component size under uniform node sampling. For further info, see the option thresholdConnectedComponent in the details from ?generateResultsGraph. Regarding the destination, the user can specify the name of the directory. Otherwise a name containing the creation date, the organism and the KEGG release will be used. The database can be stored within the library path or in a custom location.

Function loadKEGGdata returns a FELLA.DATA object from any of the databases generated by FELLA.DATA. This object is the starting point of any enrichment using FELLA. In case the user built the matrices for "diffusion" and "pagerank", he or she can choose to load them. Further detail on the methods can be found in [Picart-Armada, 2017]. The matrices allow a faster computation and the definition of a custom background, but use up to 250MB of memory each.

buildGraphFromKEGGREST returns the curated KEGG graph (class igraph)

buildDataFromGraph returns invisible(TRUE) if successful. As a side effect, the directory outdir is created, containing the internal data.

loadKEGGdata returns the FELLA.DATA object that contains the KEGG knowledge representation.

Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y., & Morishima, K. (2017). KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic acids research, 45(D1), D353-D361.

Karnovsky, A., Weymouth, T., Hull, T., Tarcea, V. G., Scardoni, G., Laudanna, C., ... & Athey, B. (2011). Metscape 2 bioinformatics tool for the analysis and visualization of metabolomics and gene expression data. Bioinformatics, 28(3), 373-380.

Tenenbaum, D. (2013). KEGGREST: Client-side REST access to KEGG. R package version, 1(1).

Chang, W., Cheng, J., Allaire, JJ., Xie, Y., & McPherson, J. (2017). shiny: Web Application Framework for R. R package version 1.0.5. https://CRAN.R-project.org/package=shiny

Picart-Armada, S., Fernandez-Albert, F., Vinaixa, M., Rodriguez, M. A., Aivio, S., Stracker, T. H., Yanes, O., & Perera-Lluna, A. (2017). Null diffusion-based enrichment for metabolomics data. PLOS ONE, 12(12), e0189012.

class FELLA.DATA

## Toy example
## In this case, the graph is not built from current KEGG. 
## It is loaded from sample data in FELLA
data("FELLA.sample")
## Graph to build the database (this example is a bit hacky)
g.sample <- FELLA:::getGraph(FELLA.sample)
dir.tmp <- paste0(tempdir(), "/", paste(sample(letters), collapse = ""))
## Build internal files in a temporary directory
buildDataFromGraph(
keggdata.graph = g.sample, 
databaseDir = dir.tmp, 
internalDir = FALSE, 
matrices = NULL, 
normality = NULL, 
dampingFactor = 0.85,
niter = 10)
## Load database
myFELLA.DATA <- loadKEGGdata(
dir.tmp, 
internalDir = FALSE)
myFELLA.DATA

######################

## Not run: 
## Full example

## First step: graph for Mus musculus discarding the mmu01100 pathway
## (an analog example can be built from human using organism = "hsa")
g.mmu <- buildGraphFromKEGGREST(
organism = "mmu", 
filter.path = "mmu01100")
summary(g.mmu)
cat(comment(g.mmu))

## Second step: build internal files for this graph
## (consumes some time and memory, especially if we compute 
"diffusion" and "pagerank" matrices)
buildDataFromGraph(
keggdata.graph = g.mmu, 
databaseDir = "example_db_mmu", 
internalDir = TRUE, 
matrices = c("hypergeom", "diffusion", "pagerank"), 
normality = c("diffusion", "pagerank"), 
dampingFactor = 0.85,
niter = 1e3)
## Third step: load the internal files into a FELLA.DATA object
FELLA.DATA.mmu <- loadKEGGdata(
"example_db_mmu", 
internalDir = TRUE, 
loadMatrix = c("diffusion", "pagerank"))
FELLA.DATA.mmu

## End(Not run)