Introduction

Boolean Nested Effects Models (B-NEM) are used to infer signalling pathways. In different experiments (conditions) genes of a pathway (S-genes) are stimulated or inhibited, alone or in combination. In each experiment transcriptional targets (E-genes) of the pathway react differently and are higher or lower expressed depending on the condition. From these differential expressions B-NEM infers Boolean functions presented as hyper-edges of a hyper-graph connecting parents and children in the pathway. For example, if the signal is transduced by two parents A and B to a child C and the signal can be blocked with a knock-down of either one, they are connected by a typical AND-gate. If the signal is still transduced during a single knock-down, but blocked by the double knock-down of A and B, they activate C by an OR-gate. In general the state of child C is defined by a Boolean function [f: \left{0,1\right}^n \to \left{0,1\right},~C = f(A_1,\dots,A_n)] with its parents $A_i, i \in \left{1,\dots,n\right}$.

The B-NEM package is based on and uses many functions of the CellNOptR package of Terfve et al., 2012.

Installing and loading B-NEM

knitr::opts_chunk$set(message=FALSE, out.width="125%", fig.align="center",
                      strip.white=TRUE, warning=FALSE, #tidy=TRUE,
                      #out.extra='style="display:block; margin:auto;"',
                      fig.height = 4, fig.width = 8, error=FALSE)
fig.cap0 <- "PKN and GTN. The prior knowledge network without any logics 
(PKN, left) and the ground truth (GTN, right). Red 'tee' arrows depict
repression the others activation of 
the child. The stimulated S-genes are diamond shaped. Blue diamond edges 
in the PKN depict the existence of activation and repression in one arrow."
fig.cap1 <- "GTN and greedy optimum. The GTN (left) and the greedy 
optmimum (right)."
fig.cap2 <- "Expected and observed data. The expected data pattern 
(bottom row) of S-gene S1 and the observed data patterns of the 
E-genes attached to S1."
fig.cap3 <- "BCR signalling. The PKN (left) and the greedy optimum 
with an empty start network (right)."

Install the package with the devtools library or via Bioconductor.

if (!requireNamespace("BiocManager", quietly = TRUE)) {
  install.packages("BiocManager")
}
BiocManager::install("bnem")

Load the Boolean Nested Effects model package.

library(bnem)

Toy example for a DAG

We show how to use B-NEM on a toy pathway presented by a directed acyclic (hyper-)graph (DAG). B-NEM demands several objects as input. The two main objects are the differential gene expression (data) and prior knowledge respectively the search space.

The following function creates a DAG, which is then extended to a full Boolean network. From this network we sample a subset of edges to define our ground truth. In the last step, the function generates data given the ground truth.

set.seed(9247)
sim <- simBoolGtn(Sgenes = 10, maxEdges = 10, keepsif = TRUE, negation=0.1,
                  layer=1)
fc <- addNoise(sim,sd=1)
expression <- sim$expression
CNOlist <- sim$CNOlist
model <- sim$model
bString <- sim$bString
ERS <- sim$ERS
PKN <- sim$PKN
children <- unique(gsub(".*=|!", "", PKN$reacID))
stimuli <- unique(gsub("=.*|!", "", PKN$reacID))
stimuli <- stimuli[which(!(stimuli %in% children))]

The following figure shows the PKN and the ground truth network.

library(mnem)
par(mfrow=c(1,2))
plotDnf(PKN$reacID, stimuli = stimuli)
plotDnf(model$reacID[as.logical(sim$bString)], stimuli = stimuli)

We suggest to take a look at the sif file in the working directory. In future analyses it is easier to just provide a suitable sif file for the investigated pathway. In a real application the underlying ground truth network (GTN) is not known.

B-NEM uses differential expression between experiments to infer the pathway logics. For example look at the colnames of sim$fc (=foldchanges of E-genes (rows)) and remember that r stimuli are our stimulated S-genes and the rest possibly inhibited. Thus in the first column of fc we have the contrast r gsub(".*_", "", colnames(fc)[1]) $-$ control. In the control no S-genes are perturbed.

We search for the GTN in our restricted network space. Each network is a sub-graph of the full hyper-graph sim\$model\$reacID. We initialize the search with a starting network and greedily search the neighborhood.

We use two different network scores. "s" is the rank correlation and "cosine" is the cosine similarity (hence no shift invariance).

## we start with the empty graph:
initBstring <- reduceGraph(rep(0, length(model$reacID)), model, CNOlist)
## or a fully connected graph:
## initBstring <- reduceGraph(rep(1, length(model$reacID)), model, CNOlist)

## paralellize for several threads on one machine or multiple machines
## see package "snowfall" for details
parallel <- NULL # NULL for serialization
## or distribute to 30 threads on four different machines:
## parallel <- list(c(4,16,8,2), c("machine1", "machine2", "machine3",
## "machine4"))

## greedy search:
greedy <- bnem(search = "greedy",
               fc=fc,
               expression=expression, # not used, if fc is defined
               CNOlist=CNOlist,
               model=model,
               parallel=parallel,
               initBstring=initBstring,
               draw = FALSE, # TRUE: draw network at each step
               verbose = FALSE, # TRUE: print changed (hyper-)edges and
               ## score improvement
               maxSteps = Inf,
               method = "s"
               )

greedy2 <- bnem(search = "greedy",
                fc=fc,
                expression=expression,
                CNOlist=CNOlist,
                model=model,
                parallel=parallel,
                initBstring=initBstring,
                draw = FALSE,
                verbose = FALSE,
                maxSteps = Inf,
                method = "cosine"
                )

We compare the binary strings defining the ground truth and the inferred networks. We first look at the respective scores (the smaller the better) and then compute sensitivity and specificity of the edges for both results.

greedy$scores # rank correlation = normalized
greedy2$scores # scaled log foldchanges

accuracy <- function(gtn,inferred) {
  sens <- sum(gtn == 1 & inferred == 1)/
    (sum(gtn == 1 & inferred == 1) + sum(gtn == 1 & inferred == 0))
  spec <- sum(gtn == 0 & inferred == 0)/
      (sum(gtn == 0 & inferred == 0) + sum(gtn == 0 & inferred == 1))
  return(c(sens,spec))
}

accuracy(bString,greedy$bString)
accuracy(bString,greedy2$bString)
resString <- greedy2$bString

We take a look at the efficiency of the search algorithm with sensitivity and specificity of the hyper-edges for the optimized network and the accuracy of its ERS (similar to the truth table). Since several networks produce the same ERS, the learned hyper-graph can differ from the GTN and still be $100\%$ accurate.

par(mfrow=c(1,2))
## GTN:
plotDnf(model$reacID[as.logical(bString)], main = "GTN", stimuli = stimuli)
## greedy optimum:
plotDnf(model$reacID[as.logical(resString)], main = "greedy optimum",
        stimuli = stimuli)

## accuracy of the expected response scheme (can be high even, if
## the networks differ):
ERS.res <- computeFc(CNOlist,
                     t(simulateStatesRecursive(CNOlist, model, resString)))
ERS.res <- ERS.res[, which(colnames(ERS.res) %in% colnames(ERS))]
print(sum(ERS.res == ERS)/length(ERS))

After optimization we look at the data and how well the greedy optimum explains the E-genes. The lower the score the better the fit.

fitinfo <- validateGraph(CNOlist, fc=fc, expression=expression, model = model,
                         bString = resString,
                         Sgene = 4, Egenes = 1000, cexRow = 0.8,
                         cexCol = 0.7,
                         xrot = 45,
                         Colv = TRUE, Rowv = TRUE, dendrogram = "both",
                         bordercol = "lightgrey",
                         aspect = "iso", sub = "")

The bottom row shows the ERS of S-gene r colnames(CNOlist@signals[[1]])[1] and the other rows show the observed response scheme (ORS) of the r colnames(CNOlist@signals[[1]])[1]-regulated E-genes. Alternatively to the greedy neighborhood search a genetic algorithm and exhaustive search are available. The exhaustive search is not recommended for search spaces with more than 20 hyper-edges.

## genetic algorithm:
genetic <- bnem(search = "genetic",
           fc=fc,
           expression=expression,
           parallel = parallel,
           CNOlist=CNOlist,
           model=model,
           initBstring=initBstring,
           popSize = 10,
           stallGenMax = 10,
           draw = FALSE,
           verbose = FALSE
           )
## exhaustive search:
exhaustive <- bnem(search = "exhaustive",
                   parallel = parallel,
                   CNOlist=CNOlist,
                   fc=fc,
                   expression=expression,
                   model=model
                   )

Stimulated and inhibited S-genes can overlap

In this section we show how to use B-NEM, if stimuli and inhibitors overlap. Additionally we want to show that B-NEM can resolve cycles. For this we allow the PKN to have cycles, but no repression, because repression can lead to an unresolvable ERS. See Pirkl et al., 2016 for details.

We look for stimuli which are also inhibited. For those we add additional stimuli S-genes. The stimuli S-gene (parent) and the inhibitor S-gene (child) are connected by a positive edge. Hence, if the S-gene is activated, the stimuli is set to 1. If the S-gene is inhibited the inhibitor S-gene is set to 0.

set.seed(9247)
sim <- simBoolGtn(Sgenes = 4, maxEdges = 50, dag = FALSE,
                  negation = 0, allstim = TRUE,frac=0.1)
fc <- addNoise(sim,sd=1)
expression <- sim$expression
CNOlist <- sim$CNOlist
model <- sim$model
bString <- sim$bString
ERS <- sim$ERS
PKN <- sim$PKN
children <- unique(gsub(".*=|!", "", PKN$reacID))
stimuli <- unique(gsub("=.*|!", "", PKN$reacID))
stimuli <- stimuli[which(!(stimuli %in% children))]

The next figure shows the cyclic PKN with extra stimuli S-genes. Notice, this way the inhibition of the S-genes overrules the stimulation.

par(mfrow=c(1,2))
plotDnf(sim$PKN$reacID, stimuli = stimuli)
plotDnf(sim$model$reacID[as.logical(bString)], stimuli = stimuli)
greedy3 <- bnem(search = "greedy",
                CNOlist=CNOlist,
                fc=fc,
                expression=expression,
                model=model,
                parallel=parallel,
                initBstring=bString*0,
                draw = FALSE,
                verbose = FALSE,
                maxSteps = Inf,
                method = "cosine"
                )
resString2 <- greedy3$bString

We look at the accuracy again.

par(mfrow=c(1,2))
plotDnf(model$reacID[as.logical(bString)], main = "GTN", stimuli = stimuli)
plotDnf(model$reacID[as.logical(resString2)], main = "greedy optimum",
        stimuli = stimuli)

accuracy(bString,resString2)
ERS.res <- computeFc(CNOlist, t(simulateStatesRecursive(CNOlist, model,
                                                        resString2)))
ERS.res <- ERS.res[, which(colnames(ERS.res) %in% colnames(ERS))]
print(sum(ERS.res == ERS)/length(ERS))

Pre-attach E-genes

One additional challenge for B-NEM compared to methods like CellNetOptimizer is the fact, that B-NEM optimizes the signalling pathway and simultaneously the attachment of the E-genes. However, it is possible to include prior knowledge into the search.

We just have to create a list object, which holds prior information about the E-genes.

egenes <- list()

for (i in PKN$namesSpecies) {
    egenes[[i]] <- rownames(fc)[grep(i, rownames(fc))]
}
initBstring <- reduceGraph(rep(0, length(model$reacID)), model, CNOlist)
greedy2b <- bnem(search = "greedy",
                 CNOlist=CNOlist,
                 fc=fc,
                 expression=expression,
                 egenes=egenes,
                 model=model,
                 parallel=parallel,
                 initBstring=initBstring,
                 draw = FALSE,
                 verbose = FALSE,
                 maxSteps = Inf,
                 method = "cosine"
                 )
resString3 <- greedy2b$bString

We attach every E-gene to its real parent in the for loop. If an E-gene is only included once in the egenes object, it's position is not learned, but fixed during the optimization of the signalling pathway. Alternatively, we can include one E-gene several times for just a subset of S-genes. This way S-genes, which do not have the E-genes included in their E-gene set are excluded as potential parents.

accuracy(bString,resString3)
ERS.res <- computeFc(CNOlist,
                     t(simulateStatesRecursive(CNOlist, model, resString3)))
ERS.res <- ERS.res[, which(colnames(ERS.res) %in% colnames(ERS))]
print(sum(ERS.res == ERS)/length(ERS))

Visualizing network residuals

We can also quantify how well the attached E-genes fit to the learned network. See the Pirkl, 2016 for more details.

residuals <- findResiduals(resString3, CNOlist, model, fc, verbose = FALSE)
## verbose = TRUE plots the residuals matrices

Rows denote S-genes in the network. Columns denote Contrasts between two experiments. Green colors in the left matrix show the score improves, if no (0) or a negative (-1) response in the network's ERS is changed to positive (+1). Red colors show a zero changed to positive. The right matrix shows the same for switched +1 and -1.

B-Cell receptor signalling

In this section we analyze the B-Cell receptor (BCR) signalling data. The data set consists of one stimulated S-gene (BCR), three S-genes with available single inhibitions (Tak1, Pik3, Erk) and three S-genes with up to triple inhibitions. See Pirkl et al. 2016 for more details.

data(bcr)
## head(fc)
fc <- bcr$fc
expression <- bcr$expression

We build a PKN to incorporate biological knowledge and account for missing combinatorial inhibitions.

library(CellNOptR)
negation <- FALSE # what happens if we allow negation?
sifMatrix <- numeric()
for (i in "BCR") {
    sifMatrix <- rbind(sifMatrix, c(i, 1, c("Pi3k")))
    sifMatrix <- rbind(sifMatrix, c(i, 1, c("Tak1")))
    if (negation) {
        sifMatrix <- rbind(sifMatrix, c(i, -1, c("Pi3k")))
        sifMatrix <- rbind(sifMatrix, c(i, -1, c("Tak1")))
    }
}
for (i in c("Pi3k", "Tak1")) {
    for (j in c("Ikk2", "p38", "Jnk", "Erk", "Tak1", "Pi3k")) {
        if (i %in% j) { next() }
        sifMatrix <- rbind(sifMatrix, c(i, 1, j))
        if (negation) {
            sifMatrix <- rbind(sifMatrix, c(i, -1, j))
        }
    }
}
for (i in c("Ikk2", "p38", "Jnk")) {
    for (j in c("Ikk2", "p38", "Jnk")) {
        if (i %in% j) { next() }
        sifMatrix <- rbind(sifMatrix, c(i, 1, j))
        if (negation) {
            sifMatrix <- rbind(sifMatrix, c(i, -1, j))
        }
    }
}
file <- tempfile(pattern="interaction",fileext=".sif")
write.table(sifMatrix, file = file, sep = "\t",
            row.names = FALSE, col.names = FALSE, quote = FALSE)
PKN <- readSIF(file)

In the next step, we create the meta information. This ensures, that we simulate all the conditions, which are actually available in the data. Furthermore we build our Boolean search space based on the PKN.

CNOlist <- dummyCNOlist(stimuli = "BCR",
                        inhibitors = c("Tak1", "Pi3k", "Ikk2",
                                       "Jnk", "p38", "Erk"),
                        maxStim = 1, maxInhibit = 3)

model <- preprocessing(CNOlist, PKN)

In the final step we learn the network with the greedy search.

initBstring <- rep(0, length(model$reacID))
bcrRes <- bnem(search = "greedy",
                   fc=fc,
                   CNOlist=CNOlist,
                   model=model,
                   parallel=parallel,
                   initBstring=initBstring,
                   draw = FALSE,
                   verbose = FALSE,
                   method = "cosine"
                   )
par(mfrow=c(1,3))
plotDnf(PKN$reacID, main = "PKN", stimuli = "BCR")
plotDnf(bcrRes$graph, main = "greedy optimum (empty start)",
        stimuli = "BCR")
sessionInfo()

References:

Pirkl, Martin, Hand, Elisabeth, Kube, Dieter, & Spang, Rainer. 2016. Analyzing synergistic and non-synergistic interactions in signalling pathways using Boolean Nested Effect Models. \textit{Bioinformatics}, 32(6), 893–900.

Pirkl, Martin. 2016. Indirect inference of synergistic and alternative signalling of intracellular pathways. University of Regensburg.

Saez-Rodriguez, Julio, Alexopoulos, Leonidas G, Epperlein, Jonathan, Samaga, Regina, Lauffenburger, Douglas A, Klamt, Steffen, & Sorger, Peter K. 2009. Discrete logic modelling as a means to link protein signalling networks with functional analysis of mammalian signal transduction. Mol Syst Biol, 5, 331.\

C Terfve, T Cokelaer, A MacNamara, D Henriques, E Goncalves, MK Morris, M van Iersel, DA Lauffenburger, J Saez-Rodriguez. CellNOptR: a flexible toolkit to train protein signaling networks to data using multiple logic formalisms. BMC Systems Biology, 2012, 6:133.



MartinFXP/B-NEM documentation built on Nov. 5, 2024, 11:56 a.m.