Description Usage Arguments Details Value References Examples
Function defineCompounds creates a 
FELLA.USER object from a list of 
compounds and a FELLA.DATA object.
Functions runHypergeom, 
runDiffusion and runPagerank 
perform an enrichment on a FELLA.USER with 
the mapped input metabolites 
(through defineCompounds) 
and a FELLA.DATA object. 
They are based on the hypergeometric test, the heat diffusion model 
and the PageRank algorithm, respectively. 
Function enrich is a wrapper with 
the following order: 
loadKEGGdata (optional), 
defineCompounds and one or more in 
runHypergeom, runDiffusion 
and runPagerank
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | defineCompounds(compounds = NULL, compoundsBackground = NULL,
    data = NULL)
runHypergeom(object = NULL, data = NULL, p.adjust = "fdr")
runDiffusion(object = NULL, data = NULL, approx = "normality",
    t.df = 10, niter = 1000)
runPagerank(object = NULL, data = NULL, approx = "normality",
    dampingFactor = 0.85, t.df = 10, niter = 1000)
enrich(compounds = NULL, compoundsBackground = NULL,
    methods = listMethods(), loadMatrix = "none", approx = "normality",
    t.df = 10, niter = 1000, databaseDir = NULL, internalDir = TRUE,
    data = NULL, ...)
 | 
| compounds | Character vector containing the KEGG IDs of the compounds considered as affected | 
| compoundsBackground | Character vector containing the KEGG IDs of 
the compounds that belong to the background. Can be  | 
| data | FELLA.DATA object | 
| object | FELLA.USER object | 
| p.adjust | Character passed to the 
 | 
| approx | Character: "simulation" for Monte Carlo, "normality", "gamma" or "t" for parametric approaches | 
| t.df | Numeric value; number of degrees of freedom 
of the t distribution 
if the approximation  | 
| niter | Number of iterations (permutations) for Monte Carlo ("simulation"), must be a numeric value between 1e2 and 1e5 | 
| dampingFactor | Numeric value between 0 and 1 (none inclusive), 
damping factor  | 
| methods | Character vector, containing some of: 
 | 
| loadMatrix | Character vector to choose if 
heavy matrices should be loaded. 
Can contain:  | 
| databaseDir | Character, path to load the 
 | 
| internalDir | Logical, is the directory located in the package directory? | 
| ... | Further arguments for the enrichment function(s) 
 | 
Function defineCompounds maps the 
specficied list of KEGG compounds [Kanehisa, 2017], usually from an 
experimental metabolomics study, to the graph contained in the
FELLA.DATA object. 
Importantly, the names must be KEGG ids, so other formats 
(common names, HMDB ids, etc) must be mapped to KEGG first. 
For example, through the "Compound ID Conversion" 
tool in MetaboAnalyst [Xia, 2015].
The user can also define a personalised background as a 
list of KEGG compound ids, which should be more extensive than 
the list of input metabolites. 
Once the compounds are mapped, the enrichment 
can be performed through runHypergeom, 
runDiffusion and runPagerank.
Function runHypergeom performs an over representation analysis 
through the hypergeometric test [Fisher, 1935] on a 
FELLA.USER object with mapped metabolites 
and a FELLA.DATA object. 
If a custom background was specified, it will be used. 
This approach is included for completeness and it is not the 
main purpose behind the FELLA package. 
Importantly, runHypergeom is not a hypergeometric test using the 
original KEGG pathways. 
Instead, a compound "belongs" to a "pathway" if 
it can reach the original pathway in the 
upwards-directed KEGG graph. 
This is a way to evaluate enrichment including indirect connections 
to a pathway, e.g. through an enzymatic family. 
New "pathways" are expected to be larger than the original pathways
in this analysis and therefore the results can differ from the 
standard over representation.
Function runDiffusion performs 
the diffusion-based enrichment on a 
FELLA.USER object with mapped metabolites 
and a FELLA.DATA object [Picart-Armada, 2017]. 
If a custom background was specified, it will be used. 
The idea behind the heat diffusion is the usage of the 
finite difference formulation of the heat equation to 
propagate labels from the metabolites to the rest of the graph.
Following the notation in [Picart-Armada, 2017], the temperatures (diffusion scores) are computed as:
T = -KI^(-1)*G
G is an indicator vector of the input metabolites 
(1 if input metabolite, 0 otherwise).
KI is the matrix -KI = L + B, being 
L the unnormalised graph Laplacian and 
B the diagonal matrix with B[i,i] = 1 if 
node i is a pathway and B[i,i] = 0 otherwise.
Equivalently, with the notation in the HotNet approach [Vandin, 2011], 
the stationary temperature is named fs:
fs = Lgamma^(-1)*bs
bs is the indicator vector G from above. 
Lgamma, on the other hand, is found as 
Lgamma = L + gamma*I, where L is the unnormalised 
graph Laplacian, gamma is the first order leaking rate 
and I is the identity matrix. 
In our formulation, only the pathway nodes are allowed to leak, 
therefore I is switched to B. 
The parameter gamma is set to gamma = 1.
The input metabolites are forced to stay warm, propagating flow to all the nodes in the network. However, only pathway nodes are allowed to evacuate this flow, so that its directionality is bottom-up. Further details on the setup of the diffusion process can be found in the supplementary file S2 from [Picart-Armada, 2017].
Finally, the warmest nodes in the graph are reported as the relevant sub-network. This will probably include some input metabolites and also reactions, enzymes, modules and pathways. Other metabolites can be suggested as well.
Function runPagerank performs the random walk 
based enrichment on a 
FELLA.USER object with mapped metabolites 
and a FELLA.DATA object.
If a custom background was specified, it will be used. 
PageRank was originally conceived as a scoring system for websites 
[Page, 1999]. 
Intuitively, PageRank favours nodes that 
(1) have a large amount of nodes pointing 
at them, and (2) whose pointing nodes also have high scores. 
Classical PageRank is formulated in terms of a random walker -  
the PageRank of a given node is the stationary probability 
of the walker visiting it. 
The walker chooses, in each step, 
whether to continue the random walk with probability 
dampingFactor or to restart it with probability 
1 - dampingFactor. 
In the original publication, dampingFactor = 0.85, 
which is the value used in FELLA by default. 
If he or she continues, an edge is picked from the outgoing edges 
in the current node with a probability proportional to its weight. 
If he or she restarts it, a node is uniformly picked from the 
whole graph. 
The "personalised PageRank" variant allows a user-defined 
distribution as the source of new random walks. 
The R package igraph contains such variant in its 
page.rank function [Csardi, 2006].
As described in the supplement S3 from [Picart-Armada, 2017], 
the PageRank PR can be computed as 
a column vector by imposing a stationary 
state in the probability.
With a damping factor d and the user-defined 
distribution p as a column vector:
PR = d*M*PR + (1 - d)*p
M is the matrix whose element M[i,j] is the 
probability of transitioning from j to i. 
If node j has outgoing edges, their probability is proportional 
to their weight - all weights must be positive. 
If node j has no outgoing edges, the probability is 
uniform over all the nodes, i.e. M[i,j] = 1/nrow(M) 
for every i. 
Note that all the columns from M sum up exactly 1.
This leads to an expression to compute PageRank:
PR = (1 - d)*p*(I - d*M)^(-1)
The idea behind the method "pagerank" is closely related 
to "diffusion". 
Relevant metabolites are the sources of new random walks and 
nodes are scored through their PageRank. 
Specifically, p is set to a uniform probability on the 
input metabolites. 
More details on the setup can be found in 
the supplementary file S3 from [Picart-Armada, 2017].
There is an important detail for "diffusion" 
and "pagerank": the scores are statistically normalised. 
Omitting this normalisation leads to a systematic bias, 
especially in pathway nodes, as described in [Picart-Armada, 2017]. 
Therefore, in both cases, scores undergo a normalisation 
through permutation analysis. 
The score of a node i is compared to its null distribution 
under input permutation, leading to their p-scores. 
As described in [Picart-Armada, 2017], two alternatives are offered: 
a parametric and deterministic approach 
and a non-parametric, stochastic one.
Stochastic Monte Carlo trials ("simulation") imply 
randomly permuting the input niter times and counting, 
for each node i, how many trials 
led to an equally or more extreme value than the original score. 
An empirical p-value is returned [North, 2002].
On the other hand, the parametric 
scores (approx = "normality") 
give a z-score for such permutation analysis. 
The expected value and variance of such null distributions 
are known quantities, see supplementary 
file S4 from [Picart-Armada, 2017].
To work in the same range [0,1], z-scores are 
transformed using the routine pnorm. 
The user can also choose the Student's t using 
approx = "t" and choosing a number of degrees of freedom 
through t.df. 
This uses the function pt instead.
Alternatively, a gamma distribution can be used by setting 
approx = "gamma". 
The theoretical mean (E) and variance (V) 
are used to define the shape 
(E^2/V) and scale (V/E) of the gamma distribution, and 
pgamma to map to [0,1].
Any sub-network prioritised by "diffusion" 
and "pagerank" is selected by applying 
a threshold on the p-scores.
Finally, the function enrich 
is a wrapper to perform the enrichment analysis. 
If no FELLA.DATA object is supplied, 
it loads it, maps the affected compounds and performs 
the desired enrichment(s) with a single call.
Returned is a list with the loaded 
FELLA.DATA object 
and the results in a FELLA.USER object. 
Conversely, the user can supply the 
FELLA.DATA object and the wrapper 
will map the metabolites and run the desired enrichment 
method(s). 
In this case, only the FELLA.USER 
will be returned.
defineCompounds returns 
the FELLA.USER object 
with the mapped metabolites, ready to be enriched.
runHypergeom returns a 
FELLA.USER object 
updated with the hypergeometric test results
runDiffusion returns a 
FELLA.USER object 
updated with the diffusion enrichment results
runPagerank returns a 
FELLA.USER object 
updated with the PageRank enrichment results
enrich returns a 
FELLA.USER object 
updated with the desired enrichment results if 
the FELLA.DATA was supplied. 
Otherwise, a list with the freshly loaded  
FELLA.DATA object and the 
corresponding enrichment in the 
FELLA.USER object.
Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y., & Morishima, K. (2017). KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic acids research, 45(D1), D353-D361.
Xia, J., Sinelnikov, I. V., Han, B., & Wishart, D. S. (2015). MetaboAnalyst 3.0 - making metabolomics more meaningful. Nucleic acids research, 43(W1), W251-W257.
Fisher, R. A. (1935). The logic of inductive inference. Journal of the Royal Statistical Society, 98(1), 39-82.
Picart-Armada, S., Fernandez-Albert, F., Vinaixa, M., Rodriguez, M. A., Aivio, S., Stracker, T. H., Yanes, O., & Perera-Lluna, A. (2017). Null diffusion-based enrichment for metabolomics data. PLOS ONE, 12(12), e0189012.
Vandin, F., Upfal, E., & Raphael, B. J. (2011). Algorithms for detecting significantly mutated pathways in cancer. Journal of Computational Biology, 18(3), 507-522.
Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank citation ranking: Bringing order to the web. Stanford InfoLab.
Csardi, G., & Nepusz, T. (2006). The igraph software package for complex network research. InterJournal, Complex Systems, 1695(5), 1-9.
North, B. V., Curtis, D., & Sham, P. C. (2002). A note on the calculation of empirical P values from Monte Carlo procedures. American journal of human genetics, 71(2), 439.
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 | ## Load the internal database. 
## This one is a toy example!
## Do not use as a regular database
data(FELLA.sample)
## Load a list of compounds to enrich
data(input.sample)
######################
## Example, step by step
## First, map the compounds
obj <- defineCompounds(
compounds = c(input.sample, "I_dont_map", "me_neither"), 
data = FELLA.sample)
obj
## See the mapped and unmapped compounds
getInput(obj)
getExcluded(obj)
## Compounds are already mapped 
## We can enrich using any method now
## If no compounds are mapped an error is thrown. Example:
## Not run: 
data(FELLA.sample)
obj <- defineCompounds(
compounds = c("C00049", "C00050"), 
data = FELLA.sample)
## End(Not run)
## Enrich using hypergeometric test
obj <- runHypergeom(
object = obj, 
data = FELLA.sample)
obj
## Enrich using diffusion
## Note how the results are added;  
## the hypergeometric results are not overwritten
obj <- runDiffusion(
object = obj, 
approx = "normality", 
data = FELLA.sample)
obj
## Enrich using PageRank
## Again, this does not overwrite other methods 
obj <- runPagerank(
object = obj, 
approx = "simulation", 
data = FELLA.sample)
obj
######################
## Example using the "enrich" wrapper
## Only diffusion
obj.wrap <- enrich(
compounds = input.sample, 
method = "diffusion", 
data = FELLA.sample)
obj.wrap
## All the methods
obj.wrap <- enrich(
compounds = input.sample, 
methods = FELLA::listMethods(), 
data = FELLA.sample)
obj.wrap
 | 
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.