knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
TODO: Add more detail about PC algorithm here, or at least like to the pc_* function.
The NetCoupler-algorithm operates by taking the following steps:
A similar procedure is provided to identify direct influences of external factors on network variables.
Given that 1) an exposure can potentially affect metabolite concentrations
(exposure -> metabolic network
) and that 2) metabolites can potentially
affect the development of a disease (metabolic network -> outcome
), two
algorithms are needed, one for the exposure side and one for the disease/outcome
side. The explanation and description of NetCoupler are taken heavily from
Clemens Wittenbecher's PhD thesis [@Wittenbecher2017].
TODO: Explain PC algorithm..?
The estimate the underlying directed acyclic graph (DAG) skeleton of the
metabolite network, NetCoupler uses an implementation of the PC-algorithm
[@Kalisch2012;@Maathuis2010;@Spirtes2001;@Colombo2014]. Because of this, the
assumptions from the PC-algorithm apply to NetCoupler. The output of this
algorithm is a graphical model (G
) of the underlying network that is used by
NetCoupler.
The exposure-side network retrieves a range of potential direct effects of exogenous {{word?}} factors on the metabolite network. The algorithmic pseudocode from functional programming perspective for the exposure-side estimation is:
input DAG skeleton input data with network variables (M), exposure (X), and confounder (C) variables set index nodes as list of all M set neighbour nodes as list of M with an edge to the index node extract from DAG skeleton set direct effect as empty for each index node in all index nodes assign new direct effects from previous loop skip if index node is already set as direct effects extract all possible combinations of neighbour nodes from index node set as node combinations map model onto node combinations model formulas as index node on X conditioned on: node combinations, confounders, and direct effects set as model list with estimate and p-value set base model as model without neighbour nodes classify as direct effect if: (minimum estimate is > 0 or maximum estimate is < 0) and sign of p-value = sign of base model p-value classify as ambigious if: (minimum estimate is > 0 or maximum estimate is < 0) and (any estimate = 0 and sign of p-value is not = sign of base model p-value) classify as no effect otherwise output confidence set of effects of X on M, based on DAG skeleton where each index node has a classification
This assumes full source population information. For applications to a sample of the population, the following decision rules were applied:
S
is empty {{correct?}}, the marginal model).For a given exposure, a metabolite would be classified as non-affected if the
FDR-adjusted p-value was non-significant in the marginal model. If there was
a significant FDR-controlled p-value in the marginal model and all estimates in
the confidence set (CS
{{? correct?}}) were significant and consistent, then
the metabolite would be classified as affected by the exposure. Other metabolites
would be classified as ambiguous with respect to the exposure.
{{TODO: reduce reliance on p-values or use another metric? Or do some bootstrapping/CV?}}
The outer loop includes identifying directly affected metabolites into the fixed model part. Then the procedure was repeated on the still ambiguous metabolites to check whether further unambiguous classification was possible based on the additional information. In the applied version of NetCoupler, this was limited to metabolites in the same connected components, which are defined as a group of two or more directly linked metabolites that were all associated with the exposure in the marginal model.
The rationale was that indirect effects could only be mediated by metabolites that were themselves (directly or indirectly) affected by the exposure. These metabolites were expected to be marginally associated with the exposure. In theory, there could be possible scenarios where the incidental cancellation of several direct and indirect effects concealed exposure-dependency in the marginal model [@Pearl2009]
Still, high abundance of incidental cancellations are considered unlikely based on observed correlation structures and the chance to unambiguously resolve such complicated scenarios in the applied modelling approach is low {{proof of claim?}}. Therefore, the pragmatic decision was taken to adjust remaining ambiguous metabolites only for directly affected metabolites identified within the same connected component.
{{TODO: rearrange this}}
Network adjustments were used to resolve indirect effects mediated by another
network variable (e.g. exposure -> mediating metabolite -> metabolite
).
The outcome-side network retrieves direct effects of network variables on later occurring events.
There are similarities to the exposure-side algorithm for the outcome-side
algorithm, except that the directionality is reversed (metabolite -> outcome
).
This change in directionality also has implications on the decision rules.
There may also be potential network-variable confounders and introduction
of spurious associations (metabolite <- confounding metabolite -> outcome
).
The general algorithm for the outcome-side estimation is:
Aim: To estimate independentdirect effects between a network INPUT graphical DAG skeleton model (G) INPUT data with network variables (M), outcome (O), and confounders (C) observations on: network variables (M), outcome (O), confounders (C) begin with empty direct effect (DE) repeat until no further ith M (Mi) is classified as DE add new DE to DE begin where AMB = M repeat until all Mi have been selected select a variable Mi from M {{not AMB?}} select all nodes adjacent to Mi in G, adj(G, Mi) repeat until no further non-redundant subset (S) can be selected from adj(G, Mi) select a S from adj(G, Mi) estimate outcome as a function of Mi conditional on S, C add effect estimate PE for Mi on O to confidence set (CSi) end if the lower bound of CSi > 0 or upper bound of CSi < 0 and sign(pe1) = sign(pe2) for every pair of estimates in CSi classify Mi as affecting risk of O else if lower bound of CSi > 0 or upper bound of CSi < 0 and 0 is in CSi or sign(pe1) = sign(pe2) for any pair of estimates in CSi classify Mi as ambiguous with respect to risk of O else classify Mi as non-affecting risk to O end end output: CS for effects of M on O based on G classification of every Mi from M as effector on O (Mi -> O) non-effector on O (Mi O) ambiguous (Mi -- O)
Where AMB
is {{meaning?}}, PE
is {{meaning?}}, adj(G, Mi)
is the set of
nodes adjust to Mi
in G
that are congruent ($\equiv$) to direct neighbors of
Mi
in G
.
An indirect effect, unlike confounding, is not biased. Decision rules were adjusted to avoid bias:
S = adj(Mi)
)Metabolites are classified as affecting the outcome if the FDR-adjusted p-value was significant in the model adjusted for the full adjacency set and all estimates in the confidence set are significant and consistent. Metabolites are considered not directly affecting the outcome if they were not significantly associated in the model adjusted for the full adjacency set. All others were considered ambiguous.
2 Markus Kalisch, Martin Maechler, Diego Colombo, Marloes H. Maathuis, Peter Buehlmann (2012). Causal Inference Using Graphical Models with the R Package pcalg. Journal of Statistical Software, 47(11), 1-26. URL http://www.jstatsoft.org/v47/i11/.
The estimated DAG skeleton ... {{complete later. Not clear about this right now. (page 43 of thesis)}}
Metabolic variables should be log-transformed and standardized on potential confounders {{this the best way?}} by regressing on age, sex, BMI, and prevalence of hypertension and using the residual variance in further models. Metabolites were then scaled (mean of zero and standard deviation of one). {{both regressed and scaled?}}
Apply transformations (log-transformation, standardization, regressing on) to the high dimensional metabolic data.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.