knitr::opts_chunk$set(echo = TRUE, 
                      tidy.opts = list(width.cutoff = 80), 
                      tidy = TRUE)
library(anansi)
library(ggplot2)

Overview {-}

A common challenge in integrative analysis of high-dimensional data sets is subsequent biological interpretation. Anansi addresses this challenge by only considering pairwise associations that are known to occur a priori. In order to achieve this, we need to provide this relational information to anansi in the form of a (bi)adjacency matrix.

This vignette is divided by two sections:

Understanding adjacency matrices {#sec-mat-1}

Example from Biology: The Krebs cycle

Recall that the citric acid cycle, or Krebs cycle, is a central piece of metabolic machinery involved in oxidative phosphorylation. See figure below for a simplified representation with 8 unique enzymatic reactions and 9 unique metabolites.

#| plot-krebs, echo=FALSE, out.width='100%', out.height='100%', warning=FALSE,
#| message=FALSE,
#| fig.cap="Simplified representation of the Krebs cycle. Orange boxes mark
#| metabolites and blue arrows represent enyzmes. "
library(patchwork)
library(ggraph)
library(igraph)
library(tidygraph)
library(dplyr)

metabs <- c(
    "isocitrate", "cis-aconitate", "citrate", "oxaloacetate",
    "L-malate", "fumarate", "succinate", "succinyl-CoA",
    "ketoglutarate"
)
d <- data.frame(
    from = c(
        "isocitrate", "ketoglutarate", "succinyl-CoA",
        "succinate", "fumarate", "L-malate", "oxaloacetate", "citrate",
        "cis-aconitate"
    ),
    to = c(
        "ketoglutarate", "succinyl-CoA", "succinate",
        "fumarate", "L-malate", "oxaloacetate", "citrate", "cis-aconitate",
        "isocitrate"
    ),
    vertices = c(
        "isocitrate\ndehydrogenase", "ketoglutarate\ndehydrogenase",
        "succinyl-CoA\nsynthetase", "succinate\ndehydrogenase",
        "\nfumarase", "malate\ndehydrogenase", "citrate\nsynthase",
        "aconitase\n", "aconitase\n"
    )
)

krebs_layout <- data.frame(
    x = (cos(seq(2 * pi, 0, length = 10) + pi / 2) * 1.25)[c(3:9, 1:2)],
    y = (sin(seq(2 * pi, 0, length = 10) + pi / 2) * 1.25)[c(3:9, 1:2)],
    circular = FALSE,
    name = c(
        "isocitrate", "ketoglutarate", "succinyl-CoA",
        "succinate", "fumarate", "L-malate",
        "oxaloacetate", "citrate", "cis-aconitate"
    ),
    .ggraph.orig_index = 1:9,
    .ggraph.index = 1:9
)

graph_from_data_frame(d, directed = TRUE) |>
    as_tbl_graph() |>
    mutate(Legend = if_else(name %in% metabs, "Metabolite", "Enzyme")) |>
    ggraph(layout = krebs_layout) +

    geom_edge_link(
        arrow = arrow(length = unit(4, "mm"), type = "closed"),
        edge_width = 2.5, color = "black",
        end_cap = circle(11.5, "mm"),
        start_cap = circle(11.25, "mm")
    ) +

    geom_edge_link(
        aes(label = vertices),
        arrow = arrow(length = unit(4, "mm"), type = "closed"),
        edge_width = 2, color = "dodgerblue",
        end_cap = circle(11.5, "mm"),
        start_cap = circle(11.5, "mm"),
        angle_calc = "along",
        label_dodge = unit(c(rep(13, 5), rep(-13, 4)), "mm"),
        label_size = 3.15
    ) +

    geom_node_label(aes(label = name),
        size = 3.5, fill = "orange",
        show.legend = FALSE, label.padding = unit(0.4, "lines"),
        nudge_x = c(0, 0.1, 0.05, -0.05, -0.1, 0, -0.1, 0, 0.1)
    ) +

    coord_equal(
        xlim = c(-1.25, 1.25), ylim = c(-1, 1),
        ratio = 1, clip = "off"
    ) +
    theme_graph()

Adjacency matrices allow us to be specific in our questions

Let's imagine we had (high-dimensional proxy) measurements available for both types. For instance, we could have transcription data for the enzymes and metabolomics data for the metabolites. We could represent our two data sets like two sets of features.

All vs-all testing

If we were to perform an all-vs-all integrative analysis between our two data sets, comprehensively testing every metabolite-enzyme pair, we'd test $8\times9=72$ hypotheses.

#| all-v-all, echo=FALSE, fig.wide=TRUE, out.width='100%', out.height='60%',
#| fig.cap="Graph representation of possible association sctructures in the
#| Krebs cycle. **A)** All-vs-all, with a full graph. **B)** Constrained, only
#| linking metabolites an enzymes that are known to directly interact."

colmat <- data.frame(
    name = c(
        "aconitase", "isocitrate\ndehydrogenase", 
        "ketoglutarate\ndehydrogenase", "succinyl-CoA\nsynthetase",
        "succinate\ndehydrogenase", "fumarase", "malate\ndehydrogenase",
        "citrate\nsynthase", "citrate", "cis-aconitate", "isocitrate", 
        "ketoglutarate", "succinyl-CoA", "succinate", "fumarate", "L-malate",
        "oxaloacetate"
    ),
    y = c(seq(9, 1, length.out = 8), 9:1),
    x = rep(c(0.7, 0.3), times = c(8, 9)),
    fill = rep(c("Enzyme", "Metabolite"), times = c(8, 9))
)

p_0 <- ggplot() +
    annotate("segment",
        x = c(0.28, 0.72), y = 0.875,
        xend = c(0.28, 0.72), yend = 9.125, color = "gray"
    ) +
    geom_text(
        data = colmat, aes(y = y, x = x, label = name), colour = "black",
        size = 3.25, hjust = c(rep(0, 8), rep(1, 9)), lineheight = 7 / 10,
        nudge_x = c(rep(0.03, 8), rep(-0.03, 9))
    ) +
    scale_fill_manual(
        values = c(Metabolite = "orange", Enzyme = "dodgerblue")
    ) +
    coord_cartesian(xlim = c(1 / 20, 19 / 20), clip = "off") +
    annotate("text",
        label = c("Metabolites", "Enzymes"),
        x = c(2 / 8, 6 / 8), y = 9.6, size = 4
    ) +
    ylab(NULL) +
    xlab(NULL) +
    theme_void()

p_a <- p_0 +
    geom_point(
        data = colmat, aes(y = y, x = x, fill = fill),
        color = "black", shape = 21, size = 4, show.legend = FALSE
    )



p_b <- p_0 + geom_segment(
    data = expand.grid(seq(9, 1, length.out = 8), 9:1),
    aes(y = Var2, yend = Var1), x = 3 / 10, xend = 7 / 10,
    color = "#393D47", inherit.aes = FALSE
) +
    geom_point(
        data = colmat, aes(y = y, x = x, fill = fill),
        color = "black",
        shape = 21, size = 4, show.legend = FALSE
    )
p_c <- p_0 + geom_segment(
    data = data.frame(
        Var2 = seq(1, 9, length.out = 8)[
            c(8, 1, 8, 8, 7, 7, 6, 6, 5, 5, 4, 4, 3, 3, 2, 2, 1)
        ],
        Var1 =
            c(9, 9, 8, 7, 7, 6, 6, 5, 5, 4, 4, 3, 3, 2, 2, 1, 1)
    ),
    aes(y = Var1, yend = Var2), x = 3 / 10, xend = 7 / 10,
    color = "#393D47", inherit.aes = FALSE
) +
    geom_point(
        data = colmat, aes(y = y, x = x, fill = fill),
        color = "black", shape = 21, size = 4, show.legend = FALSE
    )

(p_b | p_c) + plot_annotation(tag_levels = "A")

Knowledge-informed analyis of specific interactions

Often however, only a subset of these comparisons is scientifically relevant to investigate. In this case for instance, only 17 associations between enzyme-metabolite pairs are likely to make sense. Non-selectively testing all 72 associations actively harms statistical power, as 55 of these tests likely cannot be interpreted, but their p-values will still be tallied for the purpose of FDR and FWER. Moreover, they tend to obscure any biologically interpretable findings.

In order to 'know' which features-pairs in your data set should be considered, anansi requires a biadjacency matrix. The biadjacency matrix corresponding to to the sparser graph on the right can be seen in the figure below.

#| krebs-biadj, echo=FALSE, fig.small=TRUE,
#| fig.cap="Matrix representation of the constrained analysis of the Krebs
#| cycle, with cells marked with 'X' signifying tested feature-pairs."
#|

krebs_edge_df <- local({
    data("krebs", package = "anansi")
    return(krebs)
})

krebs_edge_df |>
    mutate(Enzyme = `levels<-`(
        Enzyme,
        gsub(x = levels(Enzyme), " ", replacement = "\n")
    )) |>
    ggplot() +
    geom_text(label = "X", x = I(0.5), y = I(0.5)) +
    facet_grid(Enzyme ~ Metabolite, switch = "y", space = "free_x") +
    theme_bw() +
    theme(
        strip.text.y.left = element_text(
            angle = 0, size = 10, lineheight = 7 / 10, hjust = 1
        ),
        strip.text.x.top = element_text(angle = 90, size = 10, hjust = 0),
        strip.background.x = element_rect(fill = "orange", color = "black"),
        strip.background.y = element_rect(fill = "dodgerblue", color = "black"),
        panel.spacing = unit(0.1, "lines")
    ) +
    coord_equal(ratio = 1) +
    labs(x = NULL, y = NULL)

General use in anansi {#sec-mat-2}

In anansi, we work with biadjacency matrices in the AnansiWeb S4 class, which consists of a biadjacency matrix, in this case we call it a dictionary, as well as two tables of observations that should be analysed jointly.

The anansi package offers some functions that should be sufficient for basic analysis. To enable more advanced users to pursue non-standard applications, our methodology is compatible with several popular interfaces.

Make a biadjacency matrix with weaveWeb()

The recommended default interface to generate an AnansiWeb object is weaveWeb(). Besides a traditional interface, it also accepts R formula syntax. Once we have an AnansiWeb object, we can use the $ operator to access the biadjacency matrix in the 'dictionary' slot.

# Two interfaces
form.web <- weaveWeb(ec ~ ko, link = kegg_link())

trad.web <- weaveWeb(y = "ec", x = "ko", link = kegg_link())

# Same output
identical(form.web, trad.web)

# Get the biadjacency matrix using the $ operator
head(dictionary(form.web), c(10, 10))

Note that the majority of this matrix consists of dots, it's a sparse matrix, which is a format that can considerably speed up calculations involving huge, mostly empty matrices. For further reading, see the Matrix package website.

weaveWeb() input: link

weaveWeb() takes additional arguments, let's focus on link, which will take the information necessary to link features between input data sets. The anansi package includes such data for the KEGG database, let's take a look at some of it:

# Extract the data.frame from the list that kegg_link() returns
ec2ko <- kegg_link()[["ec2ko"]]

head(ec2ko)

Structure of link input

The format of the data.frame is important: It should have two named columns. Each column contains feature IDs of one type (in this case, ec and ko, which is how the KEGG database refers to Enzyme Commission numbers and kegg orthologes, respectively. If two feature IDs are on the same row of the data.frame, it means those features are paired (hence adjacency).

linking between feature names

The column names correspond to the types of feature IDs that should be linked. These names are used throughout the AnansiWeb workflow. They can be directly called from an AnansiWeb object:

# Note that the formula controls the order
names(weaveWeb(ko ~ ec, link = kegg_link()))

# names() is short for the above
names(weaveWeb(ec ~ ko, link = kegg_link()))

The pre-packaged kegg linking map

We need to provide weaveWeb() with a map of which features are linked. We can use the pre-packaged kegg link map, which consists of two similarly structured data.frames in a list, one of which we just inspected. We can call it directly using kegg_link():

# Only print the first few entries
lapply(kegg_link(), head)

linking across two data.frame

Note the column names in the two data.frames: ec, cpd and ko. Two of these correspond to the formula we used just used: weaveWeb(ec ~ ko). We didn't use the term cpd. Further, ec is present in both data.frames.

We can use weaveWeb() to make a biadjacency matrix between any combination of one or two similarly structured data.frames, presented as a list. For the pre- packaged data set, this means we can link between any two of ec, cpd and ko, in either order.

# Formula notation
head(dictionary(weaveWeb(cpd ~ ko, link = kegg_link())), c(10, 10))

# Character notation
head(dictionary(weaveWeb(y = "ko", x = "cpd", link = kegg_link())), c(10, 10))

Use custom biadjacency matrices with AnansiWeb()

The most flexible way to make an AnansiWeb object is through AnansiWeb():

# Prep some dummy input tables
tX <- matrix(rnorm(30),
    nrow = 6,
    dimnames = list(NULL, x = paste("x", seq_len(5), sep = "_"))
)
tY <- matrix(rnorm(42),
    nrow = 6,
    dimnames = list(NULL, y = paste("y", seq_len(7), sep = "_"))
)

# Prep biadjacency matrix
# base::matrix is fine too, but will get coerced to Matrix::Matrix.
d <- matrix(
    data = sample(x = c(TRUE, FALSE), size = 35, replace = TRUE),
    nrow = NCOL(tY),
    ncol = NCOL(tX),
    dimnames = list(colnames(tY), colnames(tX))
)

# make the AnansiWeb
w <- AnansiWeb(
    tableY = tY,
    tableX = tX,
    dictionary = d
)

# Confirm the warning:
names(w)

Additional approaches

There are numerous ways in which we can define an adjacency matrix. Here, we demonstrate a graph-based and matrix based approach.

adjacency matrices with igraph

Importantly, (bi)adjacency matrices can be understood as graphs. Two common packages that deal with graphs are igraph and graph. To start off, we need to have a graph object, so let's make one using igraph.

library(igraph)
# Prepared data.frame for the occasion
head(krebs_edge_df)

# Convert data.frame to graph
g <- graph_from_data_frame(krebs_edge_df, directed = FALSE)

Now that we have constructed a graph, we still need to identify which features, vertices, belong to which data modality, in this case either enzymes and metabolites. Using the igraph package, this should be done through the type slot. type can be assigned a boolean vector, where TRUE values become columns, corresponding to features in tableX, whereas FALSE values become rows, which correspond to those in tableY.

V(g)$type <- V(g)$name %in% krebs_edge_df$Metabolite

# Now that we've defined type, we can convert to biadjacency matrix:
bi_mat1 <- as_biadjacency_matrix(g, sparse = TRUE)


head(bi_mat1, n = c(4, 5))

Though biadjacency support in graph is currently limited, We note that igraph and graph objects can be converted using the graph_from_graphnel() and as_graphnel() functions in igraph.

adjacency matrices with Matrix

We can also define a matrix directly. Conveniently, sparse matrices can be defined easily from our starting data. The Matrix library provides fantastic support for this.

library(Matrix)

# For this approach, it's useful to prepare the input as two factors:
Enzyme <- factor(krebs_edge_df$Enzyme)
Metabolite <- factor(krebs_edge_df$Metabolite)

# We can get integers out of factors, corresponding to their level indices
bi_mat2 <- sparseMatrix(
    i = as.integer(Enzyme),
    j = as.integer(Metabolite),
    dimnames = list(
        Enzyme = levels(Enzyme),
        Metabolite = levels(Metabolite)
    )
)

head(bi_mat2, n = c(4, 5))

Session info

sessionInfo()


thomazbastiaanssen/anansi documentation built on June 9, 2025, 3:59 p.m.