knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
The goal of NetCoupler is to estimate causal links between a set of -omic (e.g. metabolomics, lipidomics) or other high-dimensional data and an external variable, such as a disease outcome, an exposure, or both. The NetCoupler-algorithm, initially formulated during Clemens' PhD thesis [@Wittenbecher2017], links a conditional dependency network with an external variable (i.e. an outcome or exposure) to identify network-independent associations between the network variables and the external variable, classified as direct effects.
A typical use case we have in mind would be if a researcher might be interested in exploring potential pathways that exist between a health exposure like red meat consumption, its impact on the metabolic profile, and the subsequent impact on an outcome like type 2 diabetes incidence. So for instance, you want to ask questions to get answers that look like the figure below.
#| echo = FALSE, #| fig.cap = "The structure of questions that NetCoupler aims to help answers or explore." knitr::include_graphics("aim-output.png")
The input for NetCoupler includes:
lm()
), including
confounders to adjust for.The final output is the modeling results along with the results from NetCoupler's classification. Results can then be displayed as a joint network model in graphical format.
There are a few key assumptions to consider before using NetCoupler for your own research purposes.
NetCoupler has several frameworks in mind:
%>%
or base
R |>
operator.starts_with()
, contains()
).nc_
).The general workflow for using NetCoupler revolves around several main functions, listed below as well as visualized in the figure below:
nc_standardize()
: The algorithm in general, but especially the
network estimation method, is sensitive to the values and
distribution of the variables. Scaling the variables by
standardizing, mean-centering, and natural log transforming them are
important to obtaining more accurate estimations.nc_estimate_network()
: Estimate the connections between metabolic
variables as a undirected graph based on dependencies between
variables. This network is used to identify metabolic variables that
are connected to each other as neighbours.nc_estimate_exposure_links()
and nc_estimate_outcome_links()
:
Uses the standardized data and the estimated network to classify the
conditionally independent relationship between each metabolic
variable and an external variable (e.g. an outcome or an exposure)
as either being a direct, ambiguous, or no effect relationship.classify_option_list
. See the help documentation of the
estimating functions for more details. For larger datasets, with
more sample size and variables included in the network, we
strongly recommend lowering the threshold used to reduce the
risk of false positives.nc_join_links()
: Not implemented yet. Join together the
exposure- and outcome-side estimated links.nc_plot_network()
: Not implemented yet. Visualize the
connections estimated from nc_estimate_network()
.nc_plot_links()
: Not implemented yet. Plots the output results
from either nc_estimate_exposure_links()
,
nc_estimate_outcome_links()
, or nc_join_links()
.#| echo = FALSE, #| out.width = "60%", #| fig.cap = "NetCoupler functions and their input and ouput. Input and output #| objects are the light gray boxes, while the light blue boxes are the #| currently available functions, and the light orange boxes are functions #| planned to be developed." knitr::include_graphics("nc-diagram-io.png", dpi = 144)
The below is an example using a simulated dataset for demonstrating
NetCoupler. For more examples, particularly on how to use with different
models, check out the vignette("examples")
.
For estimating the network, it's (basically) required to standardize the
metabolic variables before inputting into nc_estimate_network()
. This
function also log-transforms and scales (mean-center and z-score
normalize) the values of the metabolic variables. We do this because the
network estimation algorithm can sometimes be finicky about differences
in variable numerical scale (mean of 1 vs mean of 1000).
library(NetCoupler) std_metabolic_data <- simulated_data %>% nc_standardize(starts_with("metabolite"))
If you have potential confounders that you need to adjust for during the
estimating links phase of NetCoupler, you'll need to include these
confounding variables when standardizing the metabolic variables. You do
this by regressing the confounding variables on the metabolic variables
by using the regressed_on
argument of nc_standardize()
. This will
automatically first standardize the variables, run models on the
metabolic variables that includes the confounding variables, and then
extract the residuals from the model which are then used to construct
the network. Here's an example:
std_metabolic_data <- simulated_data %>% nc_standardize(starts_with("metabolite"), regressed_on = "age")
After that, you can estimate the network. The network is by default
estimated using the PC-algorithm. You can read more about it in the help
page of the pc_estimate_undirected_graph()
internal function.
# Make partial independence network from metabolite data metabolite_network <- std_metabolic_data %>% nc_estimate_network(starts_with("metabolite"))
For the exposure and outcome side, you should standardize the metabolic variables, but this time, we don't regress on the confounders since they will be included in the models.
standardized_data <- simulated_data %>% nc_standardize(starts_with("metabolite"))
Now you can estimate the outcome or exposure and identify direct effects
for either the exposure side (exposure -> metabolite
) or the outcome
side (metabolite -> outcome
). For the exposure side, the function
identifies whether a link between the exposure and an index node (one
metabolic variable in the network) exists, independent of potential
confounders and from neighbouring nodes (other metabolic variables
linked to the index variable). Depending on how consistent and strong
the link is, the effect is classified as "direct", "ambiguous", or
"none".
In the example below, we specifically generated the simulated data so that the exposure is associated with metabolites 1, 8, and 12. And as we can see, those links have been correctly identified.
outcome_estimates <- standardized_data %>% nc_estimate_outcome_links( edge_tbl = as_edge_tbl(metabolite_network), outcome = "outcome_continuous", model_function = lm ) outcome_estimates exposure_estimates <- standardized_data %>% nc_estimate_exposure_links( edge_tbl = as_edge_tbl(metabolite_network), exposure = "exposure", model_function = lm ) exposure_estimates
If you want to adjust for confounders and have already used
regressed_on
in the nc_standardize()
function, add confounders to
nc_estimate_outcome_links()
or nc_estimate_exposure_links()
with the
adjustment_vars
argument:
outcome_estimates <- standardized_data %>% nc_estimate_outcome_links( edge_tbl = as_edge_tbl(metabolite_network), outcome = "outcome_continuous", model_function = lm, adjustment_vars = "age" )
If the analysis is taking a while, you can use the future package to
speed things up by implementing parallel processing. It's easy to use
parallel processing with NetCoupler since it uses the future package. By
setting the "processing plan" with future::plan()
to multisession
,
NetCoupler will use parallel processing for its computationally
intensive component of the algorithm. After you run your code, close up
the parallel processing by putting it back to normal with
plan(sequential)
. Using the future package you can speed up the
processing by almost 2.5 times.
# You'll need to have furrr installed for this to work. library(future) plan(multisession) outcome_estimates <- standardized_data %>% nc_estimate_outcome_links( edge_tbl = as_edge_tbl(metabolite_network), outcome = "outcome_continuous", model_function = lm ) plan(sequential)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.