buildGraphFromKEGGREST makes use of the KEGG
REST API (requires internet connection)
to build and return the curated KEGG graph.
buildDataFromGraph takes as input the KEGG graph
and writes the KEGG knowledge model in the desired permanent directory.
loadKEGGdata loads the internal files
containing the KEGG knowledge model into a
generateDataFromGraph are one-time executions
for a given organism and knowledge model,
in this precise order.
On the other hand, the user needs to run
in every new R session to load such model into a
1 2 3 4 5 6 7 8 9
buildGraphFromKEGGREST(organism = "hsa", filter.path = NULL) buildDataFromGraph(keggdata.graph = NULL, databaseDir = NULL, internalDir = TRUE, matrices = c("hypergeom", "diffusion", "pagerank"), normality = c("diffusion", "pagerank"), dampingFactor = 0.85, niter = 100) loadKEGGdata(databaseDir = tail(listInternalDatabases(), 1), internalDir = TRUE, loadMatrix = NULL)
Character, KEGG code for the organism of interest
Character vector, pathways to filter.
This is a pattern matched using regexp.
object generated by the function
Character containing the directory to save KEGG files.
It is a relative directory inside the library location
Logical, should the directory be internal in the package directory?
A character vector, containing any of these:
A character vector, containing any of these:
Numeric value between 0 and 1 (none inclusive),
Numeric value, number of iterations to estimate the p-values for the CC size. Between 10 and 1e3.
Character vector to choose if
heavy matrices should be loaded.
The user specifies (i) an organism, and (ii) patterns matching
pathways that should not be included as nodes.
A graph object, as described in [Picart-Armada, 2017],
is built from the comprehensive
KEGG database [Kanehisa, 2017].
As described in the main vignette, accessible through
browseVignettes("FELLA"), this graph has five levels that
represent categories of KEGG nodes.
From top to bottom: pathways, modules, enzymes, reactions and compounds.
This knowledge representation is resemblant to the one formerly
used by MetScape [Karnovsky, 2011], in which enzymes connect
to genes instead of modules and pathways.
The necessary KEGG annotations
are retrieved through KEGGREST R package [Tenenbaum, 2013].
Connections between pathways/modules and enzymes are inferred through
organism-specific genes, i.e. an edge is added if a gene
connects both entries.
However, in order to enrich metabolomics data, the user has to
pass the graph object to
to obtain the
All the networks are handled with the igraph R package [Csardi, 2006].
buildDataFromGraph is the second step
to use the
The knoledge graph is used to compute other internal variables that are
required to run any enrichment.
The main point behind the enrichment is to provide a small
part of the knowledge graph relevant to the supplied metabolites.
This is accomplished through diffusion processes and random walks,
followed by a statistical normalisation,
as described in [Picart-Armada, 2017].
When building the internal files,
the user can choose whether to store (i) matrices for each
provided method, and (ii) vectors derived from such matrices
to use the parametric approaches.
These are optional but enable (i) faster permutations and custom
metabolite backgrounds, and (ii) parametric approaches.
WARNING: diffusion and PageRank matrices in (i)
can allocate up to 250MB each.
On the other hand, the
controls the amount of trials to approximate the
distribution of the connected component size under
uniform node sampling.
For further info, see the option
in the details from
Regarding the destination, the user can specify
the name of the directory.
Otherwise a name containing the creation date, the organism
and the KEGG release will be used.
The database can be stored within the library path or in a
loadKEGGdata returns a
FELLA.DATA object from any of the
databases generated by
This object is the starting point of any enrichment
In case the user built the matrices for "diffusion" and "pagerank",
he or she can choose to load them.
Further detail on the methods can be found in [Picart-Armada, 2017].
The matrices allow a faster computation and the definition
of a custom background, but use up to 250MB of memory each.
buildGraphFromKEGGREST returns the
curated KEGG graph (class igraph)
invisible(TRUE) if successful.
As a side effect, the
outdir is created, containing
the internal data.
loadKEGGdata returns the
that contains the KEGG knowledge representation.
Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y., & Morishima, K. (2017). KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic acids research, 45(D1), D353-D361.
Karnovsky, A., Weymouth, T., Hull, T., Tarcea, V. G., Scardoni, G., Laudanna, C., ... & Athey, B. (2011). Metscape 2 bioinformatics tool for the analysis and visualization of metabolomics and gene expression data. Bioinformatics, 28(3), 373-380.
Tenenbaum, D. (2013). KEGGREST: Client-side REST access to KEGG. R package version, 1(1).
Chang, W., Cheng, J., Allaire, JJ., Xie, Y., & McPherson, J. (2017). shiny: Web Application Framework for R. R package version 1.0.5. https://CRAN.R-project.org/package=shiny
Picart-Armada, S., Fernandez-Albert, F., Vinaixa, M., Rodriguez, M. A., Aivio, S., Stracker, T. H., Yanes, O., & Perera-Lluna, A. (2017). Null diffusion-based enrichment for metabolomics data. PLOS ONE, 12(12), e0189012.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
## Toy example ## In this case, the graph is not built from current KEGG. ## It is loaded from sample data in FELLA data("FELLA.sample") ## Graph to build the database (this example is a bit hacky) g.sample <- FELLA:::getGraph(FELLA.sample) dir.tmp <- paste0(tempdir(), "/", paste(sample(letters), collapse = "")) ## Build internal files in a temporary directory buildDataFromGraph( keggdata.graph = g.sample, databaseDir = dir.tmp, internalDir = FALSE, matrices = NULL, normality = NULL, dampingFactor = 0.85, niter = 10) ## Load database myFELLA.DATA <- loadKEGGdata( dir.tmp, internalDir = FALSE) myFELLA.DATA ###################### ## Not run: ## Full example ## First step: graph for Mus musculus discarding the mmu01100 pathway ## (an analog example can be built from human using organism = "hsa") g.mmu <- buildGraphFromKEGGREST( organism = "mmu", filter.path = "mmu01100") summary(g.mmu) cat(comment(g.mmu)) ## Second step: build internal files for this graph ## (consumes some time and memory, especially if we compute "diffusion" and "pagerank" matrices) buildDataFromGraph( keggdata.graph = g.mmu, databaseDir = "example_db_mmu", internalDir = TRUE, matrices = c("hypergeom", "diffusion", "pagerank"), normality = c("diffusion", "pagerank"), dampingFactor = 0.85, niter = 1e3) ## Third step: load the internal files into a FELLA.DATA object FELLA.DATA.mmu <- loadKEGGdata( "example_db_mmu", internalDir = TRUE, loadMatrix = c("diffusion", "pagerank")) FELLA.DATA.mmu ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.