Description Usage Arguments Details Value References Examples
Function defineCompounds
creates a
FELLA.USER
object from a list of
compounds and a FELLA.DATA
object.
Functions runHypergeom
,
runDiffusion
and runPagerank
perform an enrichment on a FELLA.USER
with
the mapped input metabolites
(through defineCompounds
)
and a FELLA.DATA
object.
They are based on the hypergeometric test, the heat diffusion model
and the PageRank algorithm, respectively.
Function enrich
is a wrapper with
the following order:
loadKEGGdata
(optional),
defineCompounds
and one or more in
runHypergeom
, runDiffusion
and runPagerank
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | defineCompounds(compounds = NULL, compoundsBackground = NULL,
data = NULL)
runHypergeom(object = NULL, data = NULL, p.adjust = "fdr")
runDiffusion(object = NULL, data = NULL, approx = "normality",
t.df = 10, niter = 1000)
runPagerank(object = NULL, data = NULL, approx = "normality",
dampingFactor = 0.85, t.df = 10, niter = 1000)
enrich(compounds = NULL, compoundsBackground = NULL,
methods = listMethods(), loadMatrix = "none", approx = "normality",
t.df = 10, niter = 1000, databaseDir = NULL, internalDir = TRUE,
data = NULL, ...)
|
compounds |
Character vector containing the KEGG IDs of the compounds considered as affected |
compoundsBackground |
Character vector containing the KEGG IDs of
the compounds that belong to the background. Can be |
data |
FELLA.DATA object |
object |
FELLA.USER object |
p.adjust |
Character passed to the
|
approx |
Character: "simulation" for Monte Carlo, "normality", "gamma" or "t" for parametric approaches |
t.df |
Numeric value; number of degrees of freedom
of the t distribution
if the approximation |
niter |
Number of iterations (permutations) for Monte Carlo ("simulation"), must be a numeric value between 1e2 and 1e5 |
dampingFactor |
Numeric value between 0 and 1 (none inclusive),
damping factor |
methods |
Character vector, containing some of:
|
loadMatrix |
Character vector to choose if
heavy matrices should be loaded.
Can contain: |
databaseDir |
Character, path to load the
|
internalDir |
Logical, is the directory located in the package directory? |
... |
Further arguments for the enrichment function(s)
|
Function defineCompounds
maps the
specficied list of KEGG compounds [Kanehisa, 2017], usually from an
experimental metabolomics study, to the graph contained in the
FELLA.DATA
object.
Importantly, the names must be KEGG ids, so other formats
(common names, HMDB ids, etc) must be mapped to KEGG first.
For example, through the "Compound ID Conversion"
tool in MetaboAnalyst [Xia, 2015].
The user can also define a personalised background as a
list of KEGG compound ids, which should be more extensive than
the list of input metabolites.
Once the compounds are mapped, the enrichment
can be performed through runHypergeom
,
runDiffusion
and runPagerank
.
Function runHypergeom
performs an over representation analysis
through the hypergeometric test [Fisher, 1935] on a
FELLA.USER
object with mapped metabolites
and a FELLA.DATA
object.
If a custom background was specified, it will be used.
This approach is included for completeness and it is not the
main purpose behind the FELLA
package.
Importantly, runHypergeom
is not a hypergeometric test using the
original KEGG pathways.
Instead, a compound "belongs" to a "pathway" if
it can reach the original pathway in the
upwards-directed KEGG graph.
This is a way to evaluate enrichment including indirect connections
to a pathway, e.g. through an enzymatic family.
New "pathways" are expected to be larger than the original pathways
in this analysis and therefore the results can differ from the
standard over representation.
Function runDiffusion
performs
the diffusion-based enrichment on a
FELLA.USER
object with mapped metabolites
and a FELLA.DATA
object [Picart-Armada, 2017].
If a custom background was specified, it will be used.
The idea behind the heat diffusion is the usage of the
finite difference formulation of the heat equation to
propagate labels from the metabolites to the rest of the graph.
Following the notation in [Picart-Armada, 2017], the temperatures (diffusion scores) are computed as:
T = -KI^(-1)*G
G
is an indicator vector of the input metabolites
(1
if input metabolite, 0
otherwise).
KI
is the matrix -KI = L + B
, being
L
the unnormalised graph Laplacian and
B
the diagonal matrix with B[i,i] = 1
if
node i
is a pathway and B[i,i] = 0
otherwise.
Equivalently, with the notation in the HotNet approach [Vandin, 2011],
the stationary temperature is named fs
:
fs = Lgamma^(-1)*bs
bs
is the indicator vector G
from above.
Lgamma
, on the other hand, is found as
Lgamma = L + gamma*I
, where L
is the unnormalised
graph Laplacian, gamma
is the first order leaking rate
and I
is the identity matrix.
In our formulation, only the pathway nodes are allowed to leak,
therefore I
is switched to B
.
The parameter gamma
is set to gamma = 1
.
The input metabolites are forced to stay warm, propagating flow to all the nodes in the network. However, only pathway nodes are allowed to evacuate this flow, so that its directionality is bottom-up. Further details on the setup of the diffusion process can be found in the supplementary file S2 from [Picart-Armada, 2017].
Finally, the warmest nodes in the graph are reported as the relevant sub-network. This will probably include some input metabolites and also reactions, enzymes, modules and pathways. Other metabolites can be suggested as well.
Function runPagerank
performs the random walk
based enrichment on a
FELLA.USER
object with mapped metabolites
and a FELLA.DATA
object.
If a custom background was specified, it will be used.
PageRank was originally conceived as a scoring system for websites
[Page, 1999].
Intuitively, PageRank favours nodes that
(1) have a large amount of nodes pointing
at them, and (2) whose pointing nodes also have high scores.
Classical PageRank is formulated in terms of a random walker -
the PageRank of a given node is the stationary probability
of the walker visiting it.
The walker chooses, in each step,
whether to continue the random walk with probability
dampingFactor
or to restart it with probability
1 - dampingFactor
.
In the original publication, dampingFactor = 0.85
,
which is the value used in FELLA
by default.
If he or she continues, an edge is picked from the outgoing edges
in the current node with a probability proportional to its weight.
If he or she restarts it, a node is uniformly picked from the
whole graph.
The "personalised PageRank" variant allows a user-defined
distribution as the source of new random walks.
The R package igraph
contains such variant in its
page.rank
function [Csardi, 2006].
As described in the supplement S3 from [Picart-Armada, 2017],
the PageRank PR
can be computed as
a column vector by imposing a stationary
state in the probability.
With a damping factor d
and the user-defined
distribution p
as a column vector:
PR = d*M*PR + (1 - d)*p
M
is the matrix whose element M[i,j]
is the
probability of transitioning from j
to i
.
If node j
has outgoing edges, their probability is proportional
to their weight - all weights must be positive.
If node j
has no outgoing edges, the probability is
uniform over all the nodes, i.e. M[i,j] = 1/nrow(M)
for every i
.
Note that all the columns from M
sum up exactly 1
.
This leads to an expression to compute PageRank:
PR = (1 - d)*p*(I - d*M)^(-1)
The idea behind the method "pagerank"
is closely related
to "diffusion"
.
Relevant metabolites are the sources of new random walks and
nodes are scored through their PageRank.
Specifically, p
is set to a uniform probability on the
input metabolites.
More details on the setup can be found in
the supplementary file S3 from [Picart-Armada, 2017].
There is an important detail for "diffusion"
and "pagerank"
: the scores are statistically normalised.
Omitting this normalisation leads to a systematic bias,
especially in pathway nodes, as described in [Picart-Armada, 2017].
Therefore, in both cases, scores undergo a normalisation
through permutation analysis.
The score of a node i
is compared to its null distribution
under input permutation, leading to their p-scores.
As described in [Picart-Armada, 2017], two alternatives are offered:
a parametric and deterministic approach
and a non-parametric, stochastic one.
Stochastic Monte Carlo trials ("simulation"
) imply
randomly permuting the input niter
times and counting,
for each node i
, how many trials
led to an equally or more extreme value than the original score.
An empirical p-value is returned [North, 2002].
On the other hand, the parametric
scores (approx = "normality"
)
give a z-score for such permutation analysis.
The expected value and variance of such null distributions
are known quantities, see supplementary
file S4 from [Picart-Armada, 2017].
To work in the same range [0,1]
, z-scores are
transformed using the routine pnorm
.
The user can also choose the Student's t using
approx = "t"
and choosing a number of degrees of freedom
through t.df
.
This uses the function pt
instead.
Alternatively, a gamma distribution can be used by setting
approx = "gamma"
.
The theoretical mean (E) and variance (V)
are used to define the shape
(E^2/V) and scale (V/E) of the gamma distribution, and
pgamma
to map to [0,1].
Any sub-network prioritised by "diffusion"
and "pagerank"
is selected by applying
a threshold on the p-scores.
Finally, the function enrich
is a wrapper to perform the enrichment analysis.
If no FELLA.DATA
object is supplied,
it loads it, maps the affected compounds and performs
the desired enrichment(s) with a single call.
Returned is a list with the loaded
FELLA.DATA
object
and the results in a FELLA.USER
object.
Conversely, the user can supply the
FELLA.DATA
object and the wrapper
will map the metabolites and run the desired enrichment
method(s).
In this case, only the FELLA.USER
will be returned.
defineCompounds
returns
the FELLA.USER
object
with the mapped metabolites, ready to be enriched.
runHypergeom
returns a
FELLA.USER
object
updated with the hypergeometric test results
runDiffusion
returns a
FELLA.USER
object
updated with the diffusion enrichment results
runPagerank
returns a
FELLA.USER
object
updated with the PageRank enrichment results
enrich
returns a
FELLA.USER
object
updated with the desired enrichment results if
the FELLA.DATA
was supplied.
Otherwise, a list with the freshly loaded
FELLA.DATA
object and the
corresponding enrichment in the
FELLA.USER
object.
Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y., & Morishima, K. (2017). KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic acids research, 45(D1), D353-D361.
Xia, J., Sinelnikov, I. V., Han, B., & Wishart, D. S. (2015). MetaboAnalyst 3.0 - making metabolomics more meaningful. Nucleic acids research, 43(W1), W251-W257.
Fisher, R. A. (1935). The logic of inductive inference. Journal of the Royal Statistical Society, 98(1), 39-82.
Picart-Armada, S., Fernandez-Albert, F., Vinaixa, M., Rodriguez, M. A., Aivio, S., Stracker, T. H., Yanes, O., & Perera-Lluna, A. (2017). Null diffusion-based enrichment for metabolomics data. PLOS ONE, 12(12), e0189012.
Vandin, F., Upfal, E., & Raphael, B. J. (2011). Algorithms for detecting significantly mutated pathways in cancer. Journal of Computational Biology, 18(3), 507-522.
Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank citation ranking: Bringing order to the web. Stanford InfoLab.
Csardi, G., & Nepusz, T. (2006). The igraph software package for complex network research. InterJournal, Complex Systems, 1695(5), 1-9.
North, B. V., Curtis, D., & Sham, P. C. (2002). A note on the calculation of empirical P values from Monte Carlo procedures. American journal of human genetics, 71(2), 439.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 | ## Load the internal database.
## This one is a toy example!
## Do not use as a regular database
data(FELLA.sample)
## Load a list of compounds to enrich
data(input.sample)
######################
## Example, step by step
## First, map the compounds
obj <- defineCompounds(
compounds = c(input.sample, "I_dont_map", "me_neither"),
data = FELLA.sample)
obj
## See the mapped and unmapped compounds
getInput(obj)
getExcluded(obj)
## Compounds are already mapped
## We can enrich using any method now
## If no compounds are mapped an error is thrown. Example:
## Not run:
data(FELLA.sample)
obj <- defineCompounds(
compounds = c("C00049", "C00050"),
data = FELLA.sample)
## End(Not run)
## Enrich using hypergeometric test
obj <- runHypergeom(
object = obj,
data = FELLA.sample)
obj
## Enrich using diffusion
## Note how the results are added;
## the hypergeometric results are not overwritten
obj <- runDiffusion(
object = obj,
approx = "normality",
data = FELLA.sample)
obj
## Enrich using PageRank
## Again, this does not overwrite other methods
obj <- runPagerank(
object = obj,
approx = "simulation",
data = FELLA.sample)
obj
######################
## Example using the "enrich" wrapper
## Only diffusion
obj.wrap <- enrich(
compounds = input.sample,
method = "diffusion",
data = FELLA.sample)
obj.wrap
## All the methods
obj.wrap <- enrich(
compounds = input.sample,
methods = FELLA::listMethods(),
data = FELLA.sample)
obj.wrap
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.