Using mwcsr package

knitr::opts_chunk$set(cache = TRUE)

mwcsr is an R package to solve maximum weight connected subgraph (MWCS) problem and its variants. The package implements and provides an interface to several solvers: both exact and heuristic.

Installation

mwcsr can be installed from GitHub repository using devtools package:

library(devtools)
install_github("ctlab/mwcsr")

Quick start

Load mwcsr, as well as igraph package, which contains functions for graph manipulations.

library(mwcsr)
library(igraph)

Let's load an example instance of MWCS problem. The instance is a simple igraph object with weight vertex attribute.

data("mwcs_example")
print(mwcs_example)
summary(V(mwcs_example)$weight)

Now let us initialize a heuristic relax-and-cut MWCS solver (Alvarez-Miranda and Sinnl, 2017):

rcsolver <- rmwcs_solver()

Now we can use this solver to solve the example instance:

m <- solve_mwcsp(rcsolver, mwcs_example)
print(m$graph)
print(m$weight)

Supported problem types

Supported MWCS variants are:

In mwcsr, instances of all of the above problems are represented by an igraph object with certain specified attributes. The validity and the type of the instance can be checked using get_instance_type function, for example:

get_instance_type(mwcs_example)

Simple MWCS

Simple maximum weight connected subgraph (MWCS) problem can be defined as follows. Let $G = (V, E)$ be an undirected graph and $\omega : V \rightarrow \mathbb{R}$ is a weight function defined on the vertices. Then MWCS problem consists in finding a connected subgraph $\widetilde{G} = (\widetilde{V}, \widetilde{E})$ with a maximal total sum of vertex weights:

$$\Omega(\widetilde{G}) = \sum_{v \in \widetilde{V}} \omega(v) \rightarrow max.$$

An important property of MWCS is that solution can always be represented as a tree.

In mwcsr an MWCS instance is defined as an igraph with weight vertex attribute (and without weight edge attribute).

mwcs_example
summary(V(mwcs_example)$weight)

Budget MWCS

Budget MWCS is a slight modification of the original problem. While the objective function remains the same, an additional constraint is introduced. In this problem nonnegative values called budget costs are assigned to vertices. The sum of budget costs of the vertices in the solution is then constrained to not exceed a predefined budget.

More formally, let $c: V \rightarrow R^+$ be the function of the budget costs. $B \in R^+$ is a budget of the problem. In this terms Budget MWCS is defined as a problem where the following conditions are satisfied:

$$ \Omega(\widetilde{G}) = \sum\limits_{v \in \widetilde{V}} \omega(v) \rightarrow max.\ \text{subject to} \sum\limits_{v \in \widetilde{V}} c(v) \leq B $$

In mwcsr a Budget MWCS instance is an igraph object with weight vertex attribute as in original MWCS problem and additional budget_cost vertex attribute. Value of the budget not the part of the instance itself but rather a parameter to be passed to a function that solves an instance.

budget_mwcs_example <- mwcs_example
set.seed(42)
V(budget_mwcs_example)$budget_cost <- runif(vcount(budget_mwcs_example))
get_instance_type(budget_mwcs_example)

Generalized MWCS (GMWCS)

Generalized MWCS (GMWCS) is similar to MWCS, but edges are also weighted. More formally, let $G = (V, E)$ be an undirected graph and $\omega : (V \cup E) \rightarrow \mathbb{R}$ is a weight function defined on the vertices and the edges. Then GMWCS problem consists in finding a connected subgraph $\widetilde{G} = (\widetilde{V}, \widetilde{E})$ with a a maximal total sum of vertex and edge weights:

$$ \Omega(\widetilde{G}) = \sum_{v \in \widetilde{V}} \omega(v) + \sum_{e \in \widetilde{E}} \omega(e) \rightarrow max. $$

An important consequence of edge weights is that the optimal solution can contain cycles.

A GMWCS instance is defined as an igraph with weight attribute defined for both vertices and edges.

data(gmwcs_example)
gmwcs_example
summary(V(gmwcs_example)$weight)
summary(E(gmwcs_example)$weight)

Signal generalized MWCS (SGMWCS)

The signal generalized MWCS (SGMWCS) variant continues to generalize MWCS problem and introduces a concept of signals. Instead of specifying vertex and edge weights directly, the weights are defined for a set of signals, and vertices and edges are marked with these signals. The difference from GMWCS is that the signals can be repeated in the graph, while the weight of the subgraph is defined as a sum of weights of its unique signals.

Formally, let $G = (V, E)$ be an undirected graph, $S$ -- a set of signlas with weights $\omega: S \rightarrow \mathbb{R}$, and $\sigma: (V \cup E) \rightarrow 2^S$ -- markings of the vertices and edges with signals. An SGMWCS problem consists in finding a connected subgraph, with a maximal total sum of its unique signals:

$$ \Omega(\widetilde{G}) = \sum_{s \in \sigma(\widetilde{V} \cup \widetilde{E})} \omega(s) \rightarrow max, $$ where $\sigma(\widetilde{V} \cup \widetilde{E}) = \bigcup_{x \in (\widetilde{V} \cup \widetilde{E})} \sigma(x)$.

SGMWCS instances can arise when the data, from which the weights are inferred, map to the graph ambigously. For example, when m/z peak from mass-spectrometry data is assigned to multiple isomer metabolites, or when the same enzyme catalyze multiple reactions, and so on. Practically, it is usually assumed that the signals with negative weights are not repeated.

An SGMWCS is represented as an igraph object, with signal attribute defined for both vertices and edges and a signals attribute defined for the graph, containing the signal weights. Specification of multiple signals per node or edge is not yet supported.

data("sgmwcs_example")
sgmwcs_example
str(V(sgmwcs_example)$signal)
str(E(sgmwcs_example)$signal)
head(sgmwcs_example$signals)

Constructing SGMWCS instances

Sometimes, construction of SGMWCS instances can be simplified using normalize_sgmwcs_instance function.

Let consider an example graph obtained from gatom package.

data("gatom_example")
print(gatom_example)

In this graph, the signals graph attributed is absent with weights specified directly as vertex or edge attributes along with signal attributes, which is a very practical intermediate representation. However, it is recognized as a GMWCS instance:

get_instance_type(gatom_example)

Let convert this representation into a valid SGMWCS instance using normalize_sgmwcs_instance function:

gatom_instance <- normalize_sgmwcs_instance(gatom_example)
get_instance_type(gatom_instance)

And let call the same function with explicit arguments:

gatom_instance <- normalize_sgmwcs_instance(gatom_example,
                                            nodes.weight.column = "weight",
                                            edges.weight.column = "weight",
                                            nodes.group.by = "signal",
                                            edges.group.by = "signal", 
                                            group.only.positive = TRUE)

The function does the following:

  1. It extracts signal weights from the specified columns. NULL can be specified as a value of nodes.group.by or edges.group.by if there are no corresponding signals in the data, in which case zero signals will be created.
  2. It splits input signals with negative weights into multiple unique signals, unless group.only.positive is set to FALSE.

Supported solvers

Currently, four solvers are supported:

While selecting a particular solver depends on the particular class of instances, the general recommendations are:

Rmwcs solver

Relax-and-cut solver is a heuristic solver able to rapidly find high-quality solutions for MWCS problem (Alvarez-Miranda and Sinnl, 2017, https://doi.org/10.1016/j.cor.2017.05.015). The solver does not require any additional libraries.

Relax-and-cut solver can be constructed using rmwcs_solver function with the default arguments.

rmwcs <- rmwcs_solver()
m <- solve_mwcsp(rmwcs, mwcs_example)
print(m$weight)
print(m$solved_to_optimality)

rmwcs_solver supports Budget MWCS instances and its special case where all budget costs are set to one and called MWCS with cardinality constraints. In mwcsr such problems are represented as Simple MWCS problems and maximum cardinality is passed to solve_mwcsp funciton as argument:

 m <- solve_mwcsp(rmwcs, mwcs_example, max_cardinality = 10)
 print(vcount(m$graph))
 print(m$weight)

To solve Budget MWCS passing budget limit is necessary as well:

 m <- solve_mwcsp(rmwcs, budget_mwcs_example, budget = 10)
 print(sum(V(m$graph)$budget_cost))
 print(m$weight)

Rnc solver

Rnc solver is another relax-and-cut solver made for GMWCS/SGMWCS problems inspired by rmwcs solver. The solver does not require any libraries as well. Although with rnc_solver it is possible to solve MWCS problems, running rmwcs_solver for this type of problems of this type is preferable. No budget and cardinality variants are available.

The solver can be constructed using rnc_solver function.

  rnc <- rnc_solver()
  m <- solve_mwcsp(rnc, gmwcs_example)
  print(m$weight)
  print(m$solved_to_optimality)

And for SGMWCS instance:

  rnc <- rnc_solver()
  m <- solve_mwcsp(rnc, sgmwcs_example)
  print(m$weight)

Simulated annealing solver

Another heuristic solver is a simulated annealing based solver. The solver is rather generic, but can produce good enough solutions if parameters are tuned well. As it is heuristic solver with no estimate on upper bound of the objective function the solved to optimality flag is always set to FALSE.

This solver does support warm start allowing to run series of restarts of annealing solver with different temperature schedules.

The use of this solver may be beneficial in some cases, although there are no generic guidelines.

m <- NULL
for (i in 0:15) {
  asolver <- annealing_solver(schedule = "boltzmann", initial_temperature = 8.0 / (2 ** i),
                                final_temperature = 1 / (2 ** i))
  if (i != 0) {
    m <- solve_mwcsp(asolver, gmwcs_example, warm_start = m)
  } else {
    m <- solve_mwcsp(asolver, gmwcs_example)
  }
  print(m$weight)
}

Java-based Virgo solver

The mwcsr package provides interface to exact Java-based Virgo solver
(https://github.com/ctlab/virgo-solver) which can be used to solve MWCS, GMWCS and SGMWCS instances. The solver requires Java (11+) to be installed on your machine.

There are two modes of execution:

  1. Heuristic -- which finds a solution based on minimal spanning tree heuristic and does not require any additional setup;
  2. Exact -- which uses CPLEX library to solve the instances to provable optimality. CPLEX can be downloaded from the official web-site: https://www.ibm.com/products/ilog-cplex-optimization-studio. Free licence can be obtained for academic purposes. CPLEX version 12.7.1 or higher is required.

Heuristic solver can be constructed, by specifying cplex_dir=NULL. As it is a heuristic solver, the solved to optimality flag is always set to FALSE.

mst_solver <- virgo_solver(cplex_dir=NULL)
m <- solve_mwcsp(mst_solver, sgmwcs_example)
print(m$weight)
print(m$solved_to_optimality)

Exact solver requires setting cplex_dir argument with a path to CPLEX installation. The cplex_dir requires, that cplex.jar file and CPLEX dynamic library file (depending on the operating system: libcplex<version>.dll for Windows, libcplex<version>.so for Linux, libcplex<version>.jnilib for OS X) can be found there with recursive search. Alternatively, cplex_jar argument pointing to cplex.jar file and cplex_bin argument pointing to the directory with CPLEX dynamic library files can be specified. Additionally, it is convenient to put the path to CPLEX into a CPLEX_HOME environment variable, so that it does not have to be changed from one system to another, when run.

cplex_dir <- Sys.getenv('CPLEX_HOME')
exact_solver <- virgo_solver(cplex_dir=cplex_dir)
m <- solve_mwcsp(exact_solver, sgmwcs_example)

As the CPLEX found the optimal solution, the corresponding flag is set to TRUE:

print(m$weight)
print(m$solved_to_optimality)

Some additional information like the running time, instance files, solver version is available as in the stats field. Refer to Virgo documentation for the description of the values.

print(m$stats)

Computational resources availble for Virgo can be specified with the following parameters:

Another useful parameter is penalty. The non-zero penalty make the solver run an additional pass over the solution, with each edge penalized with the specified value. As there can be multiple solutions, having the same weight, especially in case of SGMWCS, this procedure allows to locally minimize the solution size, while preserving the weight.

psolver <- virgo_solver(cplex_dir=cplex_dir, penalty=0.001)
min_m <- solve_mwcsp(psolver, sgmwcs_example)
print(min_m$weight)
print(min_m$stats)

Now the solution, has min_m$stats$nodes nodes instead of m$stats$nodes for the solution that was found before, while having the same total weight of min_m$weight.

SCIP-jack solver

You can also use R interface to SCIP-jack solver. To use it you need to download SCIPOptSuite from the official web-site and build scipstp application. Beware, that scipstp is not provided in the pre-built SCIPOptSuite version, so you have to build it manually from source.

Quick build instructions:

```{bash eval=FALSE}

in scipoptsuite source directory

cmake -Bbuild -H. cmake --build build --target scipstp

optionally copy scipstp file somewhere to $PATH

cp build/applications/scipstp /usr/local/bin/

For complete build and configuration instructions for this solver 
visit [SCIP](https://www.scipopt.org/doc/html/INSTALL_APPLICATIONS_EXAMPLES.php) website.

After installing and ensuring that scipstp application is correctly built,
you can access `scipjack_solver` class to solve MWCS instances:
```r
scip <- scipjack_solver(scipstp_bin=Sys.which("scipstp"))
sol <- solve_mwcsp(scip, mwcs_example)

The optimization parameters are passed using \href{https://www.scipopt.org/doc-6.0.2/html/PARAMETERS.php}{SCIPopt} config file. You can modify bundled file or create a new one to fine-tune the solver.

Integration with BioNet

This part of the tutorial shows how mwcsr solvers can be combined with BioNet package to find an active module in a protein-protein interaction network. You need BioNet and DLBCL packages from Bioconductor to be installed in order to run following code examples.

BioNetInstalled <- FALSE
if (requireNamespace("BioNet") && requireNamespace("DLBCL")) {
    BioNetInstalled <- TRUE    
}

Let start with generating an example scored network, following BioNet tutorial:

if (BioNetInstalled) {
    library("BioNet")
    library("DLBCL")
    data(dataLym)
    data(interactome)
    pvals <- cbind(t = dataLym$t.pval, s = dataLym$s.pval)
    rownames(pvals) <- dataLym$label
    pval <- aggrPvals(pvals, order = 2, plot = FALSE)
    logFC <- dataLym$diff
    names(logFC) <- dataLym$label
    subnet <- subNetwork(dataLym$label, interactome)
    subnet <- rmSelfLoops(subnet)
    fb <- fitBumModel(pval, plot = FALSE)
    scores <- scoreNodes(subnet, fb, fdr = 0.001)
}

Here we have network object subnet of type graphNEL and a vector of node scores scores:

if (BioNetInstalled) {
    subnet
    str(scores)
}

BioNet comes with a heuristic MWCS FastHeinz solver, that we can use to find the module following the BioNet tutorial:

if (BioNetInstalled) {
    bionet_h <- runFastHeinz(subnet, scores)
    plotModule(bionet_h, scores=scores, diff.expr=logFC)
    sum(scores[nodes(bionet_h)])
}

We can construct an MWCS instance by converting graphNEL object into igraph and add node weights:

if (BioNetInstalled) { 
    bionet_example <- igraph.from.graphNEL(subnet, weight=FALSE) # ignoring edge weights of 1
    V(bionet_example)$weight <- scores[V(bionet_example)]
    get_instance_type(bionet_example)
}

Now the instance can be solved with the relax-and-cut solver:

if (BioNetInstalled) {
    rmwcs <- rmwcs_solver()
    bionet_m <- solve_mwcsp(rmwcs, bionet_example)
    plotModule(bionet_m$graph, scores=scores, diff.expr=logFC)
}

Note that the weight increased, compared to FastHeinz solution:

if (BioNetInstalled) {
    print(bionet_m$weight)
}

Similarly, Virgo can be used to solve the instance to provable optimality, but in this case it produces the same results:

if (BioNetInstalled) {
    bionet_m_exact <- solve_mwcsp(exact_solver, bionet_example)
    print(bionet_m_exact$weight)
    print(bionet_m_exact$solved_to_optimality)
}


Try the mwcsr package in your browser

Any scripts or data that you put into this service are public.

mwcsr documentation built on May 31, 2023, 8:41 p.m.