learn.bn: Learns a Bayesian network

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/learn.bn.R

Description

This function takes as input the eigengenes of all modules and learns a Bayesian network using bnlearn package. It builds several individual networks from random staring networks by optimizing their score. Then, it infers a consensus network from the ones with relatively "higher" scores. The default hyper-parameters and arguments should be fine for most applications.

Usage

1
2
3
4
5
6
7
8
9
learn.bn(pigengene=NULL, Data=NULL, Labels=NULL, bnPath = "bn", bnNum = 100,
  consensusRatio = 1/3, consensusThresh = "Auto", doME0 = FALSE, 
  selectedFeatures = NULL, trainingCases = "All", algo = "hc", scoring = "bde",
  restart = 0, pertFrac = 0.1, doShuffle = TRUE, use.Hartemink = TRUE, 
  bnStartFile = "None", use.Disease = TRUE, use.Effect = FALSE, dummies = NULL,
  tasks = "All", onCluster = !(which.cluster()$cluster == "local"), 
  inds = 1:ceiling(bnNum/perJob), perJob = 2, maxSeconds = 5 * 60, 
  timeJob = "00:10:00", bnCalculationJob = NULL, seed = NULL, verbose = 0,
  naTolerance=0.05)

Arguments

pigengene

An object from pigengene-class. The output of compute.pigengene function.

Data

A matrix or data frame containing the training data with eigengenes corresponding to columns and rows corresponding to samples. Rows and columns must be named.

Labels

A (preferably named) vector containing the Labels (condition types) for the training data. Names must agree with rows of Data.

bnPath

The path to save the results

bnNum

The total number of individual networks. In practice, the number of learnt networks can be less than bnNum because some jobs may take too long and be terminated.

consensusRatio

A numeric in the range 0-1 that determines what portion of highly scored networks should be used to build the consensus network

consensusThresh

A vector of thresholds in the range 0-1. For each threshold t, a consensus network will be build by considering the arcs that are present in at least a fraction of t of the individual networks. Alternatively, if it is "Auto" (the default), the threshold will be automatically set to the mean plus the standard deviation of the frequencies (strengths) of all arcs in the individual networks.

doME0

If TRUE, module 0 (the outliers) will be considered in learning the Bayesian network.

selectedFeatures

A character vector. If not NULL, only these features (eigengenes) will be used.

trainingCases

A character vector that determines which cases (samples) should be considered for learning the network.

algo

The algorithm that bnlean uses for optimizing the score. The default is "hc" (hill climbing). See arc.strength for other options and more details.

scoring

A character determining the scoring criteria. Use 'bde' and 'bic' for the Bayesian Dirichlet equivalent and Bayesian Information Criterion scores, respectively. See score for technical details.

restart

The number of random restarts. For technical use only. See hc.

pertFrac

A numeric in the range 0-1 that determines the number of attempts to randomly insert/remove/reverse an arc on every random restart. For technical use only.

doShuffle

The ordering of the features (eigengenes) is important in making the initial network. If doShuffle=TRUE, they will be shuffled before making every initial network.

use.Hartemink

If TRUE, Hartemink algorithm will be used to discretize data. Otherwise, interval discretization will be applied. See bnlearn:discretize.

bnStartFile

Optionally, learning can start from a Bayesian network instead of a random network. bnStartFile should contain a list called selected and selected$BN should be an object of bn-class. Non-technical users can set to "None" to disable.

use.Disease

If TRUE, the condition variable Disease will be included in the network, which cannot be the child of any other variable.

use.Effect

If TRUE, the condition variable beAML will be included in the network, which cannot be the parent of any other variable.

dummies

A vector of numeric values in the range 0-1. Dummy random variables will be added to the Bayesian network to check whether the learning process is effective. For development purposes only.

tasks

A character vector and a subset of c("learn","harvest","consensus","graph") that identifies the tasks to be done. Useful if part of the analysis was done previously, otherwise set to "All".

onCluster

A Boolean variable that is FALSE if the learning is not done on a computer cluster.

inds

The indices of the jobs that are included in the analysis.

perJob

The number of individual networks that are learnt by 1 job.

maxSeconds

An integer limiting computation time for each training job that runs locally, i.e., when oncluster=FALSE.

timeJob

The time in "hh:mm:ss" format requested for each job if they are running on a computer cluster.

bnCalculationJob

A script used to submit jobs to the cluster. Set to NULL if not using a cluster.

seed

The random seed that can be set to an integer to reproduce the same results.

verbose

Integer level of verbosity. 0 means silent and higher values produce more details of computation.

naTolerance

Upper threshold on the fraction of entries per gene that can be missing. Genes with a larger fraction of missing entries are ignored. For genes with smaller fraction of NA entries, the missing values are imputed from their average expression in the other samples. See check.pigengene.input.

Details

For learning a Bayesian network with tens of nodes (eigengenes), bnNum=1000 or higher is recommended. Increasing consensusThresh generally results in a network with fewer arcs. Nagarajan et al. proposed a fundamental approach that determines this hyper-parameter based on the background noise. They use non-parametric bootstrapping, which is not implemented in the current package yet.

The default values for the rest of the hyper-parameters should be fine for most applications.

Value

A list of:

consensusThresh

The vector of thresholds as described in the arguments.

indvPath

The path where the individual networks were saved.

moduleFile

The file containing data in appropriate format for bnlearn package and the blacklist arcs.

scoreFile

The file containing the record of the successively jobs and the scores of the corresponding individual networks.

consensusFile

The file containing the consensus network and its BDe and BIC scores.

bnModuleRes

The result of bn.module function. Useful mostly for development.

runs

A list containing the record of successful jobs.

scores

The list saved in scoreFile.

consensusThreshRes

The full output of consensus.thresh() function.

consensus1

The consensus Bayesian network corresponding to the first threshold. It is the output of consensus function and consensus1$BN is an object of bn-class.

scorePlot

The output of plot.scores functions, containing the scores of individual networks.

graphs

The output of plot.graphS function, containing the BDe score of the consensus network.

timeTaken

An object of difftime-class recording the learning wall-time.

use.Disease, use.Effect, use.Hartemink

Some of the input arguments.

Note

Running the jobs on a cluster needs bnCalculationJob script, which is NOT included in the package yet.

Author(s)

Amir Foroushani, Habil Zare, and Rupesh Agrahari

References

Hartemink A (2001). Principled Computational Methods for the Validation and Discovery of Genetic Regulatory Networks. Ph.D. thesis, School of Electrical Engineering and Computer Science, Massachusetts Institute of Technology.

Nagarajan, Radhakrishnan, et al. (2010) Functional relationships between genes associated with differentiation potential of aged myogenic progenitors. Frontiers in Physiology 1.

See Also

bnlearn-package, Pigengene-package, compute.pigengene

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
data(eigengenes33)
ms <- 10:20 ## A subset of modules for quick demonstration
amlE <- eigengenes33$aml[,ms]
mdsE <- eigengenes33$mds[,ms]
eigengenes <- rbind(amlE,mdsE)
Labels <- c(rep("AML",nrow(amlE)),rep("MDS",nrow(mdsE)))
names(Labels) <- rownames(eigengenes)
learnt <- learn.bn(Data=eigengenes, Labels=Labels, 
  bnPath="bnExample", bnNum=10, seed=1)
bn <- learnt$consensus1$BN

## Visualize:
d1 <- draw.bn(BN=bn,nodeFontSize=14)

## What are the children of the Disease node?
childrenD <- bnlearn::children(x=bn, node="Disease")
print(childrenD)

## Fit the parameters of the Bayesian network:
fit <- bnlearn::bn.fit(x=bn, data=learnt$consensus1$Data, method="bayes",iss=10)

## The conditional probability table for a child of the Disease node:
fit[[childrenD[1]]]

## The fitted Bayesian network can be used for predicting the labels
## (i.e., values of the Disease node).
l2 <- predict(object=fit, node="Disease", data=learnt$consensus1$Data, method="bayes-lw")
table(Labels, l2)

Pigengene documentation built on Nov. 8, 2020, 6:47 p.m.