FELLA is an R package that brings a new concept
for metabolomics data interpretation.
The starting point of this data enrichment is
a list of affected metabolites, which can stem from a
contrast between experimental groups.
This list, that may vary in size,
encompasses key role players from different
biological pathways that generate a biological perturbation.
The classical way to analyse this list is the over representation analysis. Each metabolic pathway has a statistic, the number of affected metabolites in it, that yields a p-value. After correcting for multiple testing, a list of prioritised pathways helps performing a quality check on the data and suggesting novel biological mechanisms related to the data. Subsequent generations of pathway analysis methods attempt to include quantitative and/or topological data in the statistics in order to improve power for subtle signals, but the interpretation of a prioritised pathway list remains a challenge.
FELLA, on the other hand,
introduces a comprehensive output that encompasses
other biological entities that coherently relate
the top ranked pathways.
The priorisation of the pathways and other entiteis stems from a
diffusion process on a holistic graph representation
of the KEGG database.
FELLA.USERS4 object, along with user analyses.
This vignette makes use of sample data
that contains small subgraph of
FELLA's KEGG graph
(mid 2017 KEGG release).
All the necessary contextual data is stored
in an S4 data structure with class
Several functions need access to the contextual data,
passed as an argument called
being the enrichment itself among them.
library(FELLA) data("FELLA.sample") class(FELLA.sample) show(FELLA.sample)
Keep in mind that
FELLA.DATA objects need to
be constructed only once by using
buildDataFromGraph, in that precise order.
This will store them in a local path and they
should be loaded through
The user is disadvised from manually modifying the database
internal files and the
FELLA.DATA object slots
not to corrupt the database.
The second block of necessary data is a list of affected metabolites, which shoud be specified as KEGG compound IDs. Provided is a list of hypothetical affected metabolites belonging to the graph, to which some decoys that do not map to the graph are added.
data("input.sample") input.full <- c(input.sample, paste0("intruder", 1:10)) show(input.full)
Compounds are introduced through the
function and provide the first
user data object containing the
mapped compounds and empty analyses slots.
The user should always build
defineCompounds instead of manipulating
the slots of the object manually - this might skip quality checks.
myAnalysis <- defineCompounds( compounds = input.full, data = FELLA.sample)
Note that a warning message informs the user
that some compounds did not map to the KEGG compound collection.
Compounds that successfully mapped
can be obtained through
while compounds that were excluded
because of mismatch can be accessed through
Keep in mind that exact matching is sought, so be extremely careful with whitespaces, tabs or similar characters that might create mismatches. For example:
input.fail <- paste0(" ", input.full) defineCompounds( compounds = input.fail, data = FELLA.sample)
FELLA.DATA and the
with the affected metabolites are ready,
the data can be easily enriched.
There are three methods to enrich:
method = "hypergeom"): it performs the metabolite-sampling hypergeometric test using the connections in
FELLA's KEGG graph. This is included for completeness and does not include the contextual novelty of the diffusive methods.
method = "diffusion"): it performs sub-network analysis on the KEGG graph to extract a meaningful subgraph. This subgraph can be plotted an interpreted
method = "pagerank"): analogous to
"diffusion"but using the directed diffusion, which matches the PageRank algorithm for web ranking.
two statistical approximations are proposed:
approx = "normality"): scores are computed through z-scores based on analytical expected value and covariance matrix of the null model for diffusion. This approximation is deterministic and fast.
approx = "simulation"): scores are computed through Monte Carlo trials of the random variables. This approximation requires computing the random trials, governed by the
enrich wraps the functions
in an easily usable manner, returning a
object with complete analyses.
myAnalysis <- enrich( compounds = input.full, method = "diffusion", approx = "normality", data = FELLA.sample)
The output is quite informative and aggregates
all the warnings.
Let's compare an empty
to the output of a processed one:
The wrapper function
enrich can run the three analysis
at once with the option
method = listMethods(), or only
the desired ones providing them as a character vector:
myAnalysis <- enrich( compounds = input.full, method = listMethods(), approx = "normality", data = FELLA.sample) show(myAnalysis)
The wrapped functions work in a similar way,
here is an example with
myAnalysis_bis <- runDiffusion( object = myAnalysis, approx = "normality", data = FELLA.sample) show(myAnalysis_bis)
plot for data from the package
allows a friendly visualisation of the relevant
part of the KEGG graph.
In the case
method = "hypergeom" the plot encompasses
a bipartite graph that contains
top pathways and affected compounds.
In that case,
threshold = 1 allows the visualisation
of both pathways; otherwise a plot with only one pathway
would be quite uninformative.
plot( x = myAnalysis, method = "hypergeom", main = "My first enrichment using the hypergeometric test in FELLA", threshold = 1, data = FELLA.sample)
method = "diffusion" the graph contains
a richer representations involving
modules, enzymes and reactions
that link affected pathways and compounds.
plot( x = myAnalysis, method = "diffusion", main = "My first enrichment using the diffusion analysis in FELLA", threshold = 0.1, data = FELLA.sample)
method = "pagerank" the concept is analogous to diffusion:
plot( x = myAnalysis, method = "pagerank", main = "My first enrichment using the PageRank analysis in FELLA", threshold = 0.1, data = FELLA.sample)
FELLA offers several exporting alternatives,
both for the R environment and for external software.
The appropriate functions to export the results
inside R are
generateResultsTable for a data.frame object:
myTable <- generateResultsTable( object = myAnalysis, method = "diffusion", threshold = 0.1, data = FELLA.sample) knitr::kable(head(myTable, 20))
generateResultsGraph for a
graph in igraph format:
myGraph <- generateResultsGraph( object = myAnalysis, method = "diffusion", threshold = 0.1, data = FELLA.sample) show(myGraph)
Results can be saved as permanent files.
The data.frame data format can be saved as a
myTempDir <- tempdir() myExp_csv <- paste0(myTempDir, "/table.csv") exportResults( format = "csv", file = myExp_csv, method = "pagerank", threshold = 0.1, object = myAnalysis, data = FELLA.sample) test <- read.csv(file = myExp_csv) knitr::kable(head(test))
In the same line, the graph can be saved in
myExp_graph <- paste0(myTempDir, "/graph.RData") exportResults( format = "igraph", file = myExp_graph, method = "pagerank", threshold = 0.1, object = myAnalysis, data = FELLA.sample) stopifnot("graph.RData" %in% list.files(myTempDir))
Other formats exported by igraph
are also available, internally using
Check the format argument
?igraph::write.graph for a list of
the supported formats.
For example, using
myExp_pajek <- paste0(myTempDir, "/graph.pajek") exportResults( format = "pajek", file = myExp_pajek, method = "diffusion", threshold = 0.1, object = myAnalysis, data = FELLA.sample) stopifnot("graph.pajek" %in% list.files(myTempDir))
This option is toggled if the format does not match any other predefined export option.
For reproducibility purposes, below is the
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.