learn.bn: Learns a Bayesian network
In Pigengene: Infers biological signatures from gene expression data

Description Usage Arguments Details Value Note Author(s) References See Also Examples

This function takes as input the eigengenes of all modules and learns a Bayesian network using bnlearn package. It builds several individual networks from random staring networks by optimizing their score. Then, it infers a consensus network from the ones with relatively "higher" scores. The default hyper-parameters and arguments should be fine for most applications.

learn.bn(pigengene=NULL, Data=NULL, Labels=NULL, bnPath = "bn", bnNum = 100,
  consensusRatio = 1/3, consensusThresh = "Auto", doME0 = FALSE, 
  selectedFeatures = NULL, trainingCases = "All", algo = "hc", scoring = "bde",
  restart = 0, pertFrac = 0.1, doShuffle = TRUE, use.Hartemink = TRUE, 
  bnStartFile = "None", use.Disease = TRUE, use.Effect = FALSE, dummies = NULL,
  tasks = "All", onCluster = !(which.cluster()$cluster == "local"), 
  inds = 1:ceiling(bnNum/perJob), perJob = 2, maxSeconds = 5 * 60, 
  timeJob = "00:10:00", bnCalculationJob = NULL, seed = NULL, verbose = 0,
  naTolerance=0.05)

`pigengene`	An object from `pigengene-class`. The output of `compute.pigengene` function.
`Data`	A matrix or data frame containing the training data with eigengenes corresponding to columns and rows corresponding to samples. Rows and columns must be named.
`Labels`	A (preferably named) vector containing the Labels (condition types) for the training data. Names must agree with rows of `Data`.
`bnPath`	The path to save the results
`bnNum`	The total number of individual networks. In practice, the number of learnt networks can be less than `bnNum` because some jobs may take too long and be terminated.
`consensusRatio`	A numeric in the range `0-1` that determines what portion of highly scored networks should be used to build the consensus network
`consensusThresh`	A vector of thresholds in the range `0-1`. For each threshold `t`, a consensus network will be build by considering the arcs that are present in at least a fraction of `t` of the individual networks. Alternatively, if it is "Auto" (the default), the threshold will be automatically set to the mean plus the standard deviation of the frequencies (strengths) of all arcs in the individual networks.
`doME0`	If `TRUE`, module 0 (the outliers) will be considered in learning the Bayesian network.
`selectedFeatures`	A character vector. If not `NULL`, only these features (eigengenes) will be used.
`trainingCases`	A character vector that determines which cases (samples) should be considered for learning the network.
`algo`	The algorithm that bnlean uses for optimizing the score. The default is "hc" (hill climbing). See `arc.strength` for other options and more details.
`scoring`	A character determining the scoring criteria. Use 'bde' and 'bic' for the Bayesian Dirichlet equivalent and Bayesian Information Criterion scores, respectively. See `score` for technical details.
`restart`	The number of random restarts. For technical use only. See `hc`.
`pertFrac`	A numeric in the range `0-1` that determines the number of attempts to randomly insert/remove/reverse an arc on every random restart. For technical use only.
`doShuffle`	The ordering of the features (eigengenes) is important in making the initial network. If `doShuffle=TRUE`, they will be shuffled before making every initial network.
`use.Hartemink`	If `TRUE`, Hartemink algorithm will be used to discretize data. Otherwise, interval discretization will be applied. See `bnlearn:discretize.`
`bnStartFile`	Optionally, learning can start from a Bayesian network instead of a random network. `bnStartFile` should contain a list called `selected` and `selected$BN` should be an object of `bn-class`. Non-technical users can set to `"None"` to disable.
`use.Disease`	If `TRUE`, the condition variable `Disease` will be included in the network, which cannot be the child of any other variable.
`use.Effect`	If `TRUE`, the condition variable `beAML` will be included in the network, which cannot be the parent of any other variable.
`dummies`	A vector of numeric values in the range `0-1`. Dummy random variables will be added to the Bayesian network to check whether the learning process is effective. For development purposes only.
`tasks`	A character vector and a subset of `c("learn","harvest","consensus","graph")` that identifies the tasks to be done. Useful if part of the analysis was done previously, otherwise set to `"All"`.
`onCluster`	A Boolean variable that is `FALSE` if the learning is not done on a computer cluster.
`inds`	The indices of the jobs that are included in the analysis.
`perJob`	The number of individual networks that are learnt by 1 job.
`maxSeconds`	An integer limiting computation time for each training job that runs locally, i.e., when `oncluster=FALSE`.
`timeJob`	The time in `"hh:mm:ss"` format requested for each job if they are running on a computer cluster.
`bnCalculationJob`	A script used to submit jobs to the cluster. Set to `NULL` if not using a cluster.
`seed`	The random seed that can be set to an integer to reproduce the same results.
`verbose`	Integer level of verbosity. 0 means silent and higher values produce more details of computation.
`naTolerance`	Upper threshold on the fraction of entries per gene that can be missing. Genes with a larger fraction of missing entries are ignored. For genes with smaller fraction of NA entries, the missing values are imputed from their average expression in the other samples. See `check.pigengene.input`.

For learning a Bayesian network with tens of nodes (eigengenes), bnNum=1000 or higher is recommended. Increasing consensusThresh generally results in a network with fewer arcs. Nagarajan et al. proposed a fundamental approach that determines this hyper-parameter based on the background noise. They use non-parametric bootstrapping, which is not implemented in the current package yet.

The default values for the rest of the hyper-parameters should be fine for most applications.

A list of:

`consensusThresh`	The vector of thresholds as described in the arguments.
`indvPath`	The path where the individual networks were saved.
`moduleFile`	The file containing data in appropriate format for bnlearn package and the blacklist arcs.
`scoreFile`	The file containing the record of the successively jobs and the scores of the corresponding individual networks.
`consensusFile`	The file containing the consensus network and its BDe and BIC scores.
`bnModuleRes`	The result of `bn.module` function. Useful mostly for development.
`runs`	A list containing the record of successful jobs.
`scores`	The list saved in `scoreFile`.
`consensusThreshRes`	The full output of `consensus.thresh()` function.
`consensus1`	The consensus Bayesian network corresponding to the first threshold. It is the output of `consensus` function and `consensus1$BN` is an object of `bn-class`.
`scorePlot`	The output of `plot.scores` functions, containing the scores of individual networks.
`graphs`	The output of `plot.graphS` function, containing the BDe score of the consensus network.
`timeTaken`	An object of `difftime-class` recording the learning wall-time.
`use.Disease, use.Effect, use.Hartemink`	Some of the input arguments.

Running the jobs on a cluster needs bnCalculationJob script, which is NOT included in the package yet.

Amir Foroushani, Habil Zare, and Rupesh Agrahari

Hartemink A (2001). Principled Computational Methods for the Validation and Discovery of Genetic Regulatory Networks. Ph.D. thesis, School of Electrical Engineering and Computer Science, Massachusetts Institute of Technology.

Nagarajan, Radhakrishnan, et al. (2010) Functional relationships between genes associated with differentiation potential of aged myogenic progenitors. Frontiers in Physiology 1.

bnlearn-package, Pigengene-package, compute.pigengene

data(eigengenes33)
ms <- 10:20 ## A subset of modules for quick demonstration
amlE <- eigengenes33$aml[,ms]
mdsE <- eigengenes33$mds[,ms]
eigengenes <- rbind(amlE,mdsE)
Labels <- c(rep("AML",nrow(amlE)),rep("MDS",nrow(mdsE)))
names(Labels) <- rownames(eigengenes)
learnt <- learn.bn(Data=eigengenes, Labels=Labels, 
  bnPath="bnExample", bnNum=10, seed=1)
bn <- learnt$consensus1$BN

## Visualize:
d1 <- draw.bn(BN=bn,nodeFontSize=14)

## What are the children of the Disease node?
childrenD <- bnlearn::children(x=bn, node="Disease")
print(childrenD)

## Fit the parameters of the Bayesian network:
fit <- bnlearn::bn.fit(x=bn, data=learnt$consensus1$Data, method="bayes",iss=10)

## The conditional probability table for a child of the Disease node:
fit[[childrenD[1]]]

## The fitted Bayesian network can be used for predicting the labels
## (i.e., values of the Disease node).
l2 <- predict(object=fit, node="Disease", data=learnt$consensus1$Data, method="bayes-lw")
table(Labels, l2)