Description Details Available ConstraintBased Learning Algorithms Available Scorebased Learning Algorithms Available Hybrid Learning Algorithms Other (ConstraintBased) Local Discovery Algorithms Bayesian Network Classifiers Available (Conditional) Independence Tests Available Network Scores Whitelist and Blacklist Support Error Detection and Correction: the Strict Mode Author(s) References Examples
Bayesian network structure learning (via constraintbased, scorebased and hybrid algorithms), parameter learning (via ML and Bayesian estimators) and inference.
Package:  bnlearn 
Type:  Package 
Version:  4.3 
Date:  20170115 
License:  GPLv2 or later 
This package implements some algorithms for learning the structure of Bayesian networks.
Constraintbased algorithms, also known as conditional independence learners, are all optimized derivatives of the Inductive Causation algorithm (Verma and Pearl, 1991). These algorithms use conditional independence tests to detect the Markov blankets of the variables, which in turn are used to compute the structure of the Bayesian network.
Scorebased learning algorithms are general purpose heuristic optimization algorithms which rank network structures with respect to a goodnessoffit score.
Hybrid algorithms combine aspects of both constraintbased and scorebased algorithms, as they use conditional independence tests (usually to reduce the search space) and network scores (to find the optimal network in the reduced space) at the same time.
Several functions for parameter estimation, parametric inference, bootstrap, crossvalidation and stochastic simulation are available. Furthermore, advanced plotting capabilities are implemented on top of the Rgraphviz and lattice packages.
PC (pc.stable
), a modern implementation of the
first practical constraintbased structure learning algorithm.
GrowShrink (gs
): based on the GrowShrink
Markov Blanket, the first (and simplest) Markov blanket detection
algorithm used in a structure learning algorithm.
Incremental Association (iamb
): based on the
Markov blanket detection algorithm of the same name, which is based on
a twophase selection scheme (a forward selection followed by an attempt
to remove false positives).
Fast Incremental Association (fast.iamb
): a
variant of IAMB which uses speculative stepwise forward selection to
reduce the number of conditional independence tests.
Interleaved Incremental Association (inter.iamb
):
another variant of IAMB which uses forward stepwise selection to avoid
false positives in the Markov blanket detection phase.
This package includes three implementations of each algorithm:
an optimized implementation (used when the optimized
argument
is set to TRUE
), which uses backtracking to initialize the
learning process of each node.
an unoptimized implementation (used when the optimized
argument is set to FALSE
) which is better at uncovering possible
erratic behaviour of the statistical tests.
a clusteraware implementation, which requires a running cluster set
up with the makeCluster
function from the parallel package.
The computational complexity of these algorithms is polynomial in the number of tests, usually O(N^2) (but superexponential in the worst case scenario), where N is the number of variables. Execution time scales linearly with the size of the data set.
HillClimbing (hc
): a hill climbing
greedy search on the space of the directed graphs. The optimized
implementation uses score caching, score decomposability and score
equivalence to reduce the number of duplicated tests.
Tabu Search (tabu
): a modified hillclimbing
able to escape local optima by selecting a network that minimally
decreases the score function.
Random restart with a configurable number of perturbing operations is implemented for both algorithms.
MaxMin HillClimbing (mmhc
): a hybrid algorithm
which combines the MaxMin Parents and Children algorithm (to restrict the
search space) and the HillClimbing algorithm (to find the optimal network
structure in the restricted space).
Restricted Maximization (rsmax2
): a more general
implementation of the MaxMin HillClimbing, which can use any combination
of constraintbased and scorebased algorithms.
These algorithms learn the structure of the undirected graph underlying the Bayesian network, which is known as the skeleton of the network or the (partial) correlation graph. Therefore all the arcs are undirected, and no attempt is made to detect their orientation. They are often used in hybrid learning algorithms.
MaxMin Parents and Children (mmpc
): a forward
selection technique for neighbourhood detection based on the maximization
of the minimum association measure observed with any subset of the nodes
selected in the previous iterations.
Hiton Parents and Children (si.hiton.pc
): a fast
forward selection technique for neighbourhood detection designed to
exclude nodes early based on the marginal association. The implementation
follows the SemiInterleaved variant of the algorithm.
ChowLiu (chow.liu
): an application of the
minimumweight spanning tree and the information inequality. It learns
the tree structure closest to the true one in the probability space.
ARACNE (aracne
): an improved version of the
ChowLiu algorithm that is able to learn polytrees.
All these algorithms have three implementations (unoptimized, optimized and clusteraware) like other constraintbased algorithms.
The algorithms are aimed at classification, and favour predictive power over the ability to recover the correct network structure. The implementation in bnlearn assumes that all variables, including the classifiers, are discrete.
Naive Bayes (naive.bayes
): a very simple
algorithm assuming that all classifiers are independent and using the
posterior probability of the target variable for classification.
TreeAugmented Naive Bayes (tree.bayes
): an
improvement over naive Bayes, this algorithms uses ChowLiu to approximate
the dependence structure of the classifiers.
The conditional independence tests used in constraintbased algorithms in practice are statistical tests on the data set. Available tests (and the respective labels) are:
discrete case (categorical variables)
mutual information: an informationtheoretic distance
measure. It's proportional to the loglikelihood ratio (they differ
by a 2n factor) and is related to the deviance of the
tested models. The asymptotic chisquare test
(mi
and miadf
, with adjusted degrees of freedom), the
Monte Carlo permutation test (mcmi
), the sequential Monte
Carlo permutation test (smcmi
), and the semiparametric test
(spmi
) are implemented.
shrinkage estimator for the mutual information
(mish
): an improved asymptotic chisquare test
based on the JamesStein estimator for the mutual information.
Pearson's X^2: the classical Pearson's
X^2 test for contingency tables. The asymptotic
chisquare test (x2
and x2adf
, with
adjusted degrees of freedom), the Monte Carlo permutation test
(mcx2
), the sequential Monte Carlo permutation test
(smcx2
) and semiparametric test (spx2
) are
implemented.
discrete case (ordered factors)
JonckheereTerpstra: a trend test for ordinal variables.
The asymptotic normal test (jt
), the Monte Carlo permutation
test (mcjt
) and the sequential Monte Carlo permutation test
(smcjt
) are implemented.
continuous case (normal variables)
linear correlation: Pearson's linear correlation. The exact
Student's t test (cor
), the Monte Carlo permutation test
(mccor
) and the sequential Monte Carlo permutation test
(smccor
) are implemented.
Fisher's Z: a transformation of the linear correlation
with asymptotic normal distribution. Used by commercial software
(such as TETRAD II) for the PC algorithm (an R implementation is
present in the pcalg
package on CRAN). The asymptotic normal
test (zf
), the Monte Carlo permutation test (mczf
) and
the sequential Monte Carlo permutation test (smczf
) are
implemented.
mutual information: an informationtheoretic distance
measure. Again it is proportional to the loglikelihood ratio (they
differ by a 2n factor). The asymptotic
chisquare test (mig
), the Monte Carlo
permutation test (mcmig
) and the sequential Monte Carlo
permutation test (smcmig
) are implemented.
shrinkage estimator for the mutual information
(migsh
): an improved asymptotic chisquare test
based on the JamesStein estimator for the mutual information.
hybrid case (mixed discrete and normal variables)
mutual information: an informationtheoretic distance
measure. Again it is proportional to the loglikelihood ratio (they
differ by a 2n factor). Only the asymptotic
chisquare test (micg
) is implemented.
Available scores (and the respective labels) are:
discrete case (categorical variables)
the multinomial loglikelihood (loglik
) score, which
is equivalent to the entropy measure used in Weka.
the Akaike Information Criterion score (aic
).
the Bayesian Information Criterion score (bic
),
which is equivalent to the Minimum Description Length (MDL)
and is also known as Schwarz Information Criterion.
the logarithm of the Bayesian Dirichlet equivalent score
(bde
), a score equivalent Dirichlet posterior density.
the logarithm of the Bayesian Dirichlet sparse score
(bds
), a sparsityinducing Dirichlet posterior density
(not score equivalent).
the logarithm of the Bayesian Dirichlet score with Jeffrey's prior (not score equivalent).
the logarithm of the modified Bayesian Dirichlet equivalent
score (mbde
) for mixtures of experimental and observational
data (not score equivalent).
the logarithm of the locally averaged Bayesian Dirichlet
score (bdla
, not score equivalent).
the logarithm of the K2 score (k2
), a Dirichlet
posterior density (not score equivalent).
continuous case (normal variables)
the multivariate Gaussian loglikelihood (loglikg
)
score.
the corresponding Akaike Information Criterion score
(aicg
).
the corresponding Bayesian Information Criterion score
(bicg
).
a score equivalent Gaussian posterior density (bge
).
hybrid case (mixed discrete and normal variables)
the conditional linear Gaussian loglikelihood
(loglikcg
) score.
the corresponding Akaike Information Criterion score
(aiccg
).
the corresponding Bayesian Information Criterion score
(biccg
).
All learning algorithms support arc whitelisting and blacklisting:
blacklisted arcs are never present in the graph.
arcs whitelisted in one direction only (i.e. A > B is whitelisted but B > A is not) have the respective reverse arcs blacklisted, and are always present in the graph.
arcs whitelisted in both directions (i.e. both A > B and B > A are whitelisted) are present in the graph, but their direction is set by the learning algorithm.
Any arc whitelisted and blacklisted at the same time is assumed to be whitelisted, and is thus removed from the blacklist.
In algorithms that learn undirected graphs, such as ARACNE and ChowLiu, an arc must be blacklisted in both directions to blacklist the underlying undirected arc.
Optimized implementations of constraintbased algorithms rely heavily on backtracking to reduce the number of tests needed by the learning algorithm. This approach may sometimes hide errors either in the Markov blanket or the neighbourhood detection steps, such as when hidden variables are present or there are external (logical) constraints on the interactions between the variables.
On the other hand, in the unoptimized implementations of constraintbased algorithms the learning of the Markov blanket and neighbourhood of each node is completely independent from the rest of the learning process. Thus it may happen that the Markov blanket or the neighbourhoods are not symmetric (i.e. A is in the Markov blanket of B but not vice versa), or that some arc directions conflict with each other.
The strict
argument enables some measure of error correction for such
inconsistencies, which may help to retrieve a good model when the learning
process would otherwise fail:
if strict
is set to TRUE
, every error stops the learning
process and results in an error message.
if strict
is set to FALSE
:
vstructures are applied to the network structure in lowestpvalue order; if any arc is already oriented in the opposite direction, the vstructure is discarded.
nodes which cause asymmetries in any Markov blanket are removed from that Markov blanket; they are treated as false positives.
nodes which cause asymmetries in any neighbourhood are removed from that neighbourhood; again they are treated as false positives (see Tsamardinos, Brown and Aliferis, 2006).
Each correction results in a warning.
Marco Scutari
Department of Statistics
University of Oxford
Maintainer: Marco Scutari [email protected]
Nagarajan R, Scutari M, Lebre S (2013). "Bayesian Networks in R with Applications in Systems Biology". Springer.
Scutari M (2010). "Learning Bayesian Networks with the bnlearn R Package". Journal of Statistical Software, 35(3), 122. URL http://www.jstatsoft.org/v35/i03/.
Scutari M (20107). "Bayesian Network ConstraintBased Structure Learning Algorithms: Parallel and Optimized Implementations in the bnlearn R Package". Journal of Statistical Software, 77(2), 120. URL http://www.jstatsoft.org/v77/i02/.
Koller D, Friedman N (2009). Probabilistic Graphical Models: Principles and Techniques. MIT Press.
Korb K, Nicholson AE (2010). Bayesian Artificial Intelligence. Chapman & Hall/CRC, 2nd edition.
Pearl J (1988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63  library(bnlearn)
data(learning.test)
## Simple learning
# first try the GrowShrink algorithm
res = gs(learning.test)
# plot the network structure.
plot(res)
# now try the Incremental Association algorithm.
res2 = iamb(learning.test)
# plot the new network structure.
plot(res2)
# the network structures seem to be identical, don't they?
all.equal(res, res2)
# how many tests each of the two algorithms used?
ntests(res)
ntests(res2)
# and the unoptimized implementation of these algorithms?
## Not run: ntests(gs(learning.test, optimized = FALSE))
## Not run: ntests(iamb(learning.test, optimized = FALSE))
## Greedy search
res = hc(learning.test)
plot(res)
## Another simple example (Gaussian data)
data(gaussian.test)
# first try the GrowShrink algorithm
res = gs(gaussian.test)
plot(res)
## Blacklist and whitelist use
# the arc B  F should not be there?
blacklist = data.frame(from = c("B", "F"), to = c("F", "B"))
blacklist
res3 = gs(learning.test, blacklist = blacklist)
plot(res3)
# force E  F direction (E > F).
whitelist = data.frame(from = c("E"), to = c("F"))
whitelist
res4 = gs(learning.test, whitelist = whitelist)
plot(res4)
# use both blacklist and whitelist.
res5 = gs(learning.test, whitelist = whitelist, blacklist = blacklist)
plot(res5)
## Debugging
# use the debugging mode to see the learning algorithms
# in action.
res = gs(learning.test, debug = TRUE)
res = hc(learning.test, debug = TRUE)
# log the learning process for future reference.
## Not run:
sink(file = "learninglog.txt")
res = gs(learning.test, debug = TRUE)
sink()
# if something seems wrong, try the unoptimized version
# in strict mode (inconsistencies trigger errors):
res = gs(learning.test, optimized = FALSE, strict = TRUE, debug = TRUE)
# or disable strict mode to let the algorithm fix errors on the fly:
res = gs(learning.test, optimized = FALSE, strict = FALSE, debug = TRUE)
## End(Not run)

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.