bnlearn-package: Bayesian network structure learning, parameter learning and...

Description Details Available constraint-based learning algorithms Available score-based learning algorithms Available hybrid learning algorithms Other (constraint-based) local discovery algorithms Bayesian Network classifiers Available (conditional) independence tests Available network scores Whitelist and blacklist support Error detection and correction: the strict mode Author(s) References Examples

Description

Bayesian network structure learning (via constraint-based, score-based and hybrid algorithms), parameter learning (via ML and Bayesian estimators) and inference.

Details

Package: bnlearn
Type: Package
Version: 3.4-20131101
Date: 2013-11-01
License: GPLv2 or later

This package implements some algorithms for learning the structure of Bayesian networks.

Constraint-based algorithms, also known as conditional independence learners, are all optimized derivatives of the Inductive Causation algorithm (Verma and Pearl, 1991). These algorithms use conditional independence tests to detect the Markov blankets of the variables, which in turn are used to compute the structure of the Bayesian network.

Score-based learning algorithms are general purpose heuristic optimization algorithms which rank network structures with respect to a goodness-of-fit score.

Hybrid algorithms combine aspects of both constraint-based and score-based algorithms, as they use conditional independence tests (usually to reduce the search space) and network scores (to find the optimal network in the reduced space) at the same time.

Several functions for parameter estimation, parametric inference, bootstrap, cross-validation and stochastic simulation are available. Furthermore, advanced plotting capabilities are implemented on top of the Rgraphviz and lattice packages.

Available constraint-based learning algorithms

This package includes three implementations of each algorithm:

The computational complexity of these algorithms is polynomial in the number of tests, usually O(N^2) (O(N^4) in the worst case scenario), where N is the number of variables. Execution time scales linearly with the size of the data set.

Available score-based learning algorithms

Random restart with a configurable number of perturbing operations is implemented for both algorithms.

Available hybrid learning algorithms

Other (constraint-based) local discovery algorithms

These algorithms learn the structure of the undirected graph underlying the Bayesian network, which is known as the skeleton of the network or the (partial) correlation graph. Therefore all the arcs are undirected, and no attempt is made to detect their orientation. They are often used in hybrid learning algorithms.

All these algorithms have three implementations (unoptimized, optimized and cluster-aware) like other constraint-based algorithms.

Bayesian Network classifiers

The algorithms are aimed at classification, and favour predictive power over the ability to recover the correct network structure. The implementation in bnlearn assumes that all variables, including the classifiers, are discrete.

Available (conditional) independence tests

The conditional independence tests used in constraint-based algorithms in practice are statistical tests on the data set. Available tests (and the respective labels) are:

Available network scores

Available scores (and the respective labels) are:

Whitelist and blacklist support

All learning algorithms support arc whitelisting and blacklisting:

Any arc whitelisted and blacklisted at the same time is assumed to be whitelisted, and is thus removed from the blacklist.

In algorithms that learn undirected graphs, such as ARACNE and Chow-Liu, an arc must be blacklisted in both directions to blacklist the underlying undirected arc.

Error detection and correction: the strict mode

Optimized implementations of constraint-based algorithms rely heavily on backtracking to reduce the number of tests needed by the learning algorithm. This approach may sometimes hide errors either in the Markov blanket or the neighbourhood detection steps, such as when hidden variables are present or there are external (logical) constraints on the interactions between the variables.

On the other hand, in the unoptimized implementations of constraint-based algorithms the learning of the Markov blanket and neighbourhood of each node is completely independent from the rest of the learning process. Thus it may happen that the Markov blanket or the neighbourhoods are not symmetric (i.e. A is in the Markov blanket of B but not vice versa), or that some arc directions conflict with each other.

The strict parameter enables some measure of error correction for such inconsistencies, which may help to retrieve a good model when the learning process would otherwise fail:

Author(s)

Marco Scutari
UCL Genetics Institute (UGI)
University College London

Maintainer: Marco Scutari marco.scutari@gmail.com

References

(a BibTeX file with all the references cited throughout this manual is present in the ‘bibtex’ directory of this package)

Nagarajan R, Scutari M, Lebre S (2013). "Bayesian Networks in R with Applications in Systems Biology". Springer.

Scutari M (2010). "Learning Bayesian Networks with the bnlearn R Package". Journal of Statistical Software, 35(3), 1-22. URL http://www.jstatsoft.org/v35/i03/.

Koller D, Friedman N (2009). Probabilistic Graphical Models: Principles and Techniques. MIT Press.

Korb K, Nicholson AE (2010). Bayesian Artificial Intelligence. Chapman & Hall/CRC, 2nd edition.

Pearl J (1988). Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
library(bnlearn)
data(learning.test)

## Simple learning
# first try the Grow-Shrink algorithm
res = gs(learning.test)
# plot the network structure.
plot(res)
# now try the Incremental Association algorithm.
res2 = iamb(learning.test)
# plot the new network structure.
plot(res2)
# the network structures seem to be identical, don't they?
all.equal(res, res2)
# how many tests each of the two algorithms used?
ntests(res)
ntests(res2)
# and the unoptimized implementation of these algorithms?
## Not run: ntests(gs(learning.test, optimized = FALSE))
## Not run: ntests(iamb(learning.test, optimized = FALSE))

## Greedy search
res = hc(learning.test)
plot(res)

## Another simple example (Gaussian data)
data(gaussian.test)
# first try the Grow-Shrink algorithm
res = gs(gaussian.test)
plot(res)

## Blacklist and whitelist use
# the arc B - F should not be there?
blacklist = data.frame(from = c("B", "F"), to = c("F", "B"))
blacklist
res3 = gs(learning.test, blacklist = blacklist)
plot(res3)
# force E - F direction (E -> F).
whitelist = data.frame(from = c("E"), to = c("F"))
whitelist
res4 = gs(learning.test, whitelist = whitelist)
plot(res4)
# use both blacklist and whitelist.
res5 = gs(learning.test, whitelist = whitelist, blacklist = blacklist)
plot(res5)

## Debugging
# use the debugging mode to see the learning algorithms
# in action.
res = gs(learning.test, debug = TRUE)
res = hc(learning.test, debug = TRUE)
# log the learning process for future reference.
## Not run: 
sink(file = "learning-log.txt")
res = gs(learning.test, debug = TRUE)
sink()
# if something seems wrong, try the unoptimized version
# in strict mode (inconsistencies trigger errors):
res = gs(learning.test, optimized = FALSE, strict = TRUE, debug = TRUE)
# or disable strict mode to let the algorithm fix errors on the fly:
res = gs(learning.test, optimized = FALSE, strict = FALSE, debug = TRUE)

## End(Not run)

vspinu/bnlearn documentation built on May 3, 2019, 7:08 p.m.