README.md

ATEN

And/Or Tree Ensemble for inferring accurate Boolean network topology and dynamics

Note: please note that set.seed() is not suitable for the Package parallel. In order to make the results reproducible, we introduced function clusterSetRNGStream(), more deatails please see the Pacakge parallel.

# Install devtools from CRAN
install.packages("devtools")

# Download the development version of ATEN from GitHub:
devtools::install_github("ningshi/ATEN")

# Load it to library
library("ATEN")

Manual/Usage

And/Or tree: We use Lists to represent an And/Or tree (i.e., a Boolean function). For instance, assuming we have a network consisting of 5 nodes in which the target node is x1 with its Boolean function f. Here, please note that we use time-series data, namely we have x1(t+1)=f(x1)(t), where t corresponds to the time point.

Suppose we have f(x1)(t) = x1||x2&&!x3||!x2&&x5(t), we can denote the Boolean function f with tree<-list(1,c(2,8),c(4,5)) The integer 1/2/4 smaller than 5 (the number of nodes) represent the 1st/2nd/5th node respectively; and the integer 8 greater than 5 represents the (8-5)rd node.

We present the prime implicant using the same way

Network inference:

# ngenes is the number of nodes in a Boolean network ngenes<-10 # k is the maximum number of genes k<-5 # Call generateRandomNKNetwork() to generate a Boolean network, for more deatils about arguments used in generateRandomNKNetwork, see package BoolNet set.seed(0) net1<-generateRandomNKNetwork(ngenes, k, topology="scale_free",simplify=TRUE,readableFunctions=TRUE)

# See net1

net1 Boolean network with 10 genes

Involved genes: Gene1 Gene2 Gene3 Gene4 Gene5 Gene6 Gene7 Gene8 Gene9 Gene10

Transition functions: Gene1 = (!Gene1) Gene2 = (Gene8) Gene3 = (Gene10) Gene4 = (Gene10) Gene5 = (Gene2 & Gene3) Gene6 = (Gene4) Gene7 = (!Gene2) | (!Gene5) Gene8 = (!Gene8) | (Gene4) Gene9 = (Gene5) Gene10 = (!Gene3) ```

- Step. 3 Select a target node, generate the bootstrap samples and out-of-bag (oob) samples for inferring and selecting prime implicants (PIs) # For instance, we select the 6th node as the target node target<-6

# Generate the bootstrap samples and oob samples according to the time-series data datasamples<-bootstrap(datalist)

# note that 'respinbag' and 'respoutbag' save the in-bag(bootstrap) and oob expression values of the target node, respectively datasamples$respinbag<-matrix(datasamples$respinbag[,target]) datasamples$respoutbag<-matrix(datasamples$respoutbag[,target]) ```

# We shall discuss how to tune those arguments later. ```

# B represents how many trees would be generated in the forest # the relevant datalist and datasamples are also required for network inference # the last parameter 'seed' used in findPIs is for helping reproduce the results, we set it as 0 here. PIs<-findPIs(B=10,datalist,datasamples,parameters,0)

# In our case, we obtained 5 prime implicants after removing non-important ones

PIs [[1]] [1] 4

[[2]] [1] 3

[[3]] [1] 4 19

[[4]] [1] 4 8 19

[[5]] [1] 4 13 18

```

-Step. 6 Find the Boolean function according to those PIs and RFRE framework ``` # Update the datasets so that the new datalist corresponds to the PIs datalist[[2]]<-generateData(PIs,datalist) datalist[[3]]<-matrix(datalist[[3]][,target]) datasamples<-bootstrap(datalist) datasamples$respinbag<-matrix(datasamples$respinbag) datasamples$respoutbag<-matrix(datasamples$respoutbag)

# Identify the final Boolean function # the last parameter used in findBF() is for helping reproduce the results, we set it as 0 here. BF<-findBF(5,PIs,target,parameters,datalist,datasamples,0)

# Check the final solution we obtained

BF "Gene4" ```

Something new in ATEN. In some cases, Step. 6 is not required, for instance, using the same Boolean network but with noisy data this time ``` # Generate the time-series data with 5% noise datalist<-buildTimeSeries(network=net1,numSeries=10,numPoints=10,noiseLevel=0.05)

# Now selecte the first node as the target node target<-1

# Generate the bootstrap samples and oob samples according to the time-series data datasamples<-bootstrap(datalist) # respinbag and respoutbag save the expression values of the target node datasamples$respinbag<-matrix(datasamples$respinbag[,target]) datasamples$respoutbag<-matrix(datasamples$respoutbag[,target])

# Find the important PIs PIs<-findPIs(B=10,datalist,datasamples,parameters,0)

# See PIs

PIs [1] "!Gene1"

# We can find the result is not a list of PIs but the final Boolean function. # In this way, Step.6 is not required. # There are two reasons why Step.5 directly returns the result: # 1) only 1 PI left after eliminating the non-important ones; # 2) after elimination, only a few PIs (=<4) left, we then directly build a And/Or tree.

# Think about the 2nd reason, actually a better way is to invoke Best-Fit method there to find the optimal solution as Best-Fit can always fast find all putative Boolean functions. ```

Parameters setting Someone would be interested in how to set the tree size (i.e. the maximum number of input genes of the target gene), please find more details in our Supplementary Data.

The other default parameters/arguments values are OK for small networks (<=10 nodes), but not the best; and you can make them better if you're willing to invest time in learning how to set the parameters/arguments.

And also any other good SA algorithms (or other heuristic algorithms) are also welcome to be introduced into ATEN. By the way, it is very easy to make ATEN as a feature selection tool before applying Best-Fit (i.e. finding all putative Boolean functions). We shall update it later.

Future work - Besides what we discuss above, another direction is to make it adaptive for different sizes of networks (e.g. implement ATEN using C to speed up ATEN for larger networks). - We also expect our idea can be used for inferrng probabilistic Boolean networks and asynchronous networks. - Include Best-Fit or other approaches that can help ATEN find the optimal solutions.

References:

Shi, N., Zhu, Z., Tang, K., Parker, D. and He, S., 2020. ATEN: And/Or tree ensemble for inferring accurate Boolean network topology and dynamics. Bioinformatics, 36(2), pp.578-585.

Müssel, C., Hopfensitz, M. and Kestler, H.A., 2010. BoolNet—an R package for generation, reconstruction and analysis of Boolean networks. Bioinformatics, 26(10), pp.1378-1380.

Lähdesmäki, H., Shmulevich, I. and Yli-Harja, O., 2003. On learning gene regulatory networks under the Boolean network model. Machine learning, 52(1-2), pp.147-167.



ningshi/ATEN documentation built on April 27, 2021, 7:40 a.m.