[Introduction]
[Example  Pulmonary Embolism]
[Creating "Skeleton" HydeNetwork Objects]
[Creating HydeNetwork Objects With a Training Dataset]
[Creating HydeNetwork Objects With a List of Models]
 [A Note on Factor Conversion]
[Specifying Distributions for Individual Nodes]
 [Univariate Distributions for Root Nodes]
> [Binary Root Nodes]
> [Normallydistributed Root Nodes]
> [Multicategory Root Nodes]
> [Other Univariate Distributions]
[Regression Equations]
 [Ordinary Least Squares (OLS)]
 [Logistic Regression]
[Using R
Model Objects]
 [Warning About Limited Scope of writeJagsModel
Methods]
 [Conditional Probability Tables (CPTs)]
[Deterministic Nodes]
[Modifying the Graph Structure]
Setting up Bayesian network models with HydeNet
generally involves two components  specifying the network structure and specifying the (conditional) probability distribution of each node (given any parent nodes). Network structure is specified within the HydeNetwork()
function, while node distributions can be set using either HydeNetwork()
, setNode()
, or setNodeModels()
. Generally, HydeNetwork()
would be used to simultaneously define the distributions for all the nodes in the network in a single function call, while setNode()
and setNodeModels()
are used to define the distribution of a specific node in an existing HydeNetwork
object. Also, HydeNetwork()
offers a relatively limited set of options in terms of the nature of the specified distributions, while the other two functions offer more flexibility.
HydeNetwork()
can be called in three different ways. The first involves explicit specification by the user of the network structure (according to the formula syntax implemented in gRbase::dag()
) but no specification of a training dataset or models to populate node distributions. This results in a "skeleton" HydeNetwork
object. The second technique involves the same explicit specification of the network structure, but also passing a training dataset. In this case, conditional probability distributions for all the nodes in the network are estimated, using frequency tabulation, linear regression, logistic regression, or multinomial logistic regression, depending on the classes (and number of levels, for factors) of the variables in the data frame and the userspecified network structure.
The third way to invoke HydeNetwork()
is to pass a "bag of models", or more specifically a list argument containing one or more model objects as elements. In this method, the network structure is automatically built using the names of the response and explanatory variables within each of the models included in the list argument. Permissible model classes include xtabs
, cpt
, lm
, glm
, and multinom
. Note that, in the HydeNet
package, we have included the cpt
model class. This stands for conditional probablity table, and is intended to facilitate the specification of categorical node distributions for which all parent nodes are also categorical. See help("cpt")
for details and see below for examples. Note also that at this time the glm
class only works with family="binomial"
; defining a node's distribution using other families is possible, however, using setNode()
.
In any of the above three cases, but especially in the first case (i.e., when HydeNetwork()
is used with neither a training data nor a list of model objects to populate node distributions), the distributions for each node in the network can be manually specified, onebyone. This is accomplished with either the setNode()
function or the setNodeModels()
function. As we discuss below, we have implemented a multitude of techniques for specifying node distributions with these functions.
We start by loading the package:
install.packages("HydeNet") library(HydeNet) options(Hyde_fitModel = FALSE)
In the above output, the required packages for HydeNet
are listed. In addition, this package uses JAGS, which is standalone software for implementing Markov Chain Monte Carlo simulation. JAGS is called from R
via a package called rjags
. See help("rjagspackage")
for details.
The network we will study involves the diagnosis and treatment of pulmonary embolism, or PE (node pe). PE is a condition where the arteries carrying blood to the lungs get blocked, typically by a blood clot that dislodged from a vein in the leg. There are two commonlyused tests for diagnosing PE. One is a blood test called Ddimer (node d.dimer), and the other is pulmonary angiography (node angio). For each, the probability of positive and negative test values depends on the status of PE. In other words, the conditional distribution function for each test node can be defined using the sensitivity and specificity of each test. The Ddimer test also is affected by pregnancy (node pregnant), with higher false positive rates. Clinicians prior beliefs about the likelihood of PE are captured in a score (node wells). Since PE cannot directly be observed, the likelihood of a patient receiving treatment (node treat) depends on the test results. And the likelihood of survival through hospital discharge (node death) depends on both the status of the disease and whether or not the patient received treatment.
A graphical representation of the PE network can be constructed based on an unpopulated HydeNetwork object (i.e., a "base" object for which node distributions have not yet been specified):
net < HydeNetwork(~ wells + pe  wells + d.dimer  pregnant*pe + angio  pe + treat  d.dimer*angio + death  pe*treat) plot(net)
The HydeNetwork object we created, called net
, is worth exploring:
net
Since we haven't given HydeNetwork()
information on conditional probability distributions for the nodes in the network, we have a skeleton object where each node is distributed as normal given its parent nodes. The parameters mu
and tau
are to this point unspecified (note: in JAGS, the mean and precision are specified for the normal distribution  the precision parameter $\tau$ is equal to $1/\sigma^2$).
Using HydeNetwork()
with a training dataset implements the following default nodespecific model classes, depending on the class of the node (in other words, the class of the variable in the training dataset), whether or not the node has parent nodes, and if so, the classes of the parent nodes:
Node Class Parents Model Type Function 
::::
factor
with any number of levelsNone Tabulation stats::xtabs()

factor
with any number of levelsAll of class factor
Conditional Probabilility Table HydeNet::cpt()

Binary factor
At least 1 numeric
or integer
Logistic Regression stats::glm(..., family="binomial")

factor
with 3+ levels At least 1 numeric
or integer
Multinomial Logistic Regressionnnet::multinom()

numeric
or integer
Ordinary Least Squares stats::lm()

The syntax for building a Bayesian network using training data is rather simple:
data(PE, package='HydeNet') autoNet < HydeNetwork(~ wells + pe  wells + d.dimer  pregnant*pe + angio  pe + treat  d.dimer*angio + death  pe*treat, data = PE) writeNetworkModel(autoNet, pretty=TRUE)
We can see by the output that the models have all been populated, and verify that these are indeed the coefficients we obtain from the functions in the above table:
glm(treat ~ d.dimer+angio, data=PE, family="binomial")$coef
xtabs(~PE$pregnant) / nrow(PE)
The same network can be constructed by feeding HydeNetwork()
a list of model objects:
g1 < lm(wells ~ 1, data=PE) g2 < glm(pe ~ wells, data=PE, family="binomial") g3 < lm(d.dimer ~ pe + pregnant, data=PE) g4 < xtabs(~ pregnant, data=PE) g5 < cpt(angio ~ pe, data=PE) g6 < glm(treat ~ d.dimer + angio, data=PE, family="binomial") g7 < cpt(death ~ pe + treat, data=PE) bagOfModels < list(g1,g2,g3,g4,g5,g6,g7) bagNet < HydeNetwork(bagOfModels) writeNetworkModel(bagNet, pretty=TRUE)
The advantage of this approach is that it allows for somewhat increased flexibility in specifying the model parameterization for each node (e.g., inclusion of nonlinear effects and/or interactions). However, we caution that all these models ultimately get translated to JAGS code, and this translation is relatively limited in terms of the types of model parameterizations supported. We discuss this issue in greater detail below, under the heading "[Warning About Limited Scope of writeJagsModel
Methods]".
JAGS uses integers to represent levels of factors. Levels of factors are retained as a list element (called factorRef
) in the output of compileJagsModel()
. In the function bindPosterior()
, we have facilitated the process of converting posterior MCMC samples into a single data frame with an option to relabel factors. This process is demonstrated in our 'Getting Started with HydeNet' vignette.
Below, we describe the usage of setNode()
and setNodeModels()
.
The most straightforward way to specify distributions for root nodes, or nodes without parents is by using setNode
with specific distributions and parameters. For example, returning to our original unpopulated network (object net
), we can define a Bernoulli distribution for node pregnant:
net < setNode(network = net, node = pregnant, nodeType = "dbern", prob=.4) net
In the code above, we can see that setNode
works by returning a modified HydeNet object. In the output, node pregnant is now Bernoulli with probability of 0.4.
Univariate normal distributions are specified using nodeType = "dnorm"
. We will specify a normal distribution with a $\mu = 5$ and $\sigma = 1.5$ for node wells:
net < setNode(net, wells, nodeType = "dnorm", mean = 5, sd = 1.5) net$nodeType$wells net$nodeParams$wells
Suppose instead that the Wells score was categorical in nature, with three values (e.g., low, medium and high). We can specify categorical distributions as follows:
net < setNode(net, wells, nodeType = "dcat", pi = vectorProbs(p = c(.3, .6, .1), wells) ) net$nodeType$wells net$nodeParams$wells
Note here that we have overwritten the node distribution within the object net
to be categorical in nature.
The vectorProbs()
function converts a probability vector into JAGS code, as seen above in the list element net$nodeParams$wells
. This function will by default normalize probability vectors, so that counts can be directly fed into the model:
net < setNode(net, wells, nodeType = "dcat", pi = vectorProbs(p = c(37, 162, 48), wells) ) net$nodeType$wells net$nodeParams$wells
We could have achieved the same by directly inserting the JAGS code into the pi parameter:
net < setNode(net, wells, nodeType = "dcat", pi = "pi.wells[1] < 0.15; pi.wells[2] < 0.66; pi.wells[3] < 0.19")
HydeNet
supports all the statistical distributions supported by JAGS. A table of these distributions is stored in the jagsDists
dataset:
data(jagsDists, package='HydeNet') jagsDists[,c(1:3, 6:8)]
knitr::kable(jagsDists[, c(1:3, 6:8)])
So, to assign a Weibull distribution to a node XYZ, we would use the following code:
net < setNode(net, XYZ, nodeType = "dweib", shape=2, scale=5)
Finally, note that there is builtin error handling when parameters are outside allowable limits:
net < setNode(net, d.dimer, nodeType = "dpois", lambda=10)
For OLS models, nodeType="dnorm"
can be used. We use a regression equation to characterize the dependency of the node on its parents. We note again that normal distributions are specified using the mean and precision parameters, where the precision parameter is the inverse of the variance.
setNode()
supports the use of formula syntax to define a regression equation for a given node. This is achieved using the fromFormula()
function with the nodeFormula parameter, as follows:
net < setNode(net, d.dimer, nodeType="dnorm", mean=fromFormula(), sd=sqrt(30), #sigma^2 = 30 nodeFormula = d.dimer ~ 210 + 29*pregnant + 68*pe) net$nodeType$d.dimer net$nodeParams$d.dimer net$nodeFormula$d.dimer
Or, alternatively, one may directly specify JAGS code for the parameters as character strings. Below, we do this for mu
:
net < setNode(net, d.dimer, nodeType="dnorm", mean="210 + 29*pregnant + 68*pe", sd = sqrt(30))
However, the model syntax is flexible, allowing for alternative distributions to be used if desired. For example, maybe the distribution of the residuals has heavy tails; here, the (nonstandardized) Student's t distribution could be used:
net < setNode(net, d.dimer, nodeType="dt", mean="210 + 29*pregnant + 68*pe", sd=sqrt(20), df=2)
The decision of whether to give an R
style formula or JAGS code is a matter of preference. But when using R
code, one needs to ensure that any functions used in the formula can be translated to JAGS code. A list of functions that can be translated between R
and JAGS can be viewed by calling
data(jagsFunctions, package='HydeNet') jagsFunctions
knitr::kable(jagsFunctions)
If the intercept and slope coefficients of a logistic regression model are known, one may define a Bernoullidistributed node using the ilogit
function in JAGS (inverse logit):
equation < "6.3 + 0.02*d.dimer + 2.9*angio  0.005*d.dimer*angio" net < setNode(net, treat, nodeType="dbern", prob=paste("ilogit(", equation, ")"), validate=FALSE)
R
Model ObjectsAbove, we showed how HydeNetwork()
can be used with a list of model objects to populate both the graph and the corresponding node distributions. In a similar fashion, certain R
model classes can be used to populate the distribution for individual nodes in an existing HydeNetwork
object. This is achieved using the setNodeModels()
function. Currently, setNodeModels()
is compatible with the following model classes: xtabs
, cpt
, lm
, glm
, (family="binomial"
only) and multinom
.
Above, we constructed a HydeNetwork
object called bagNet
for the PE network by passing a list of model objects. Suppose we wanted to modify one of the models and repopulate the network, e.g., by introducing an interaction term. This is achieved with the following code:
bagNet$nodeType$d.dimer bagNet$nodeParams$d.dimer bagNet$nodeFormula$d.dimer
new.DDimer.Model < lm(d.dimer ~ pe * pregnant, data=PE) bagNet < setNodeModels(bagNet, new.DDimer.Model) writeNetworkModel(bagNet, pretty=TRUE)
writeJagsModel
MethodsPassing model objects to HydeNetwork
objects, either using HydeNetwork.list()
or setNodeModels()
, is handled by invoking the writeJagsModel()
methods. These methods accept the model object (e.g., an lm
object) as input and populate a variety of list elements within the HydeNetwork
object (e.g., $nodeFormula
, $nodeFitter
, $nodeFitterArgs
, $nodeParams
, etc.). The core functionality of these methods is to use the R
model object to write JAGS code implementing the probability distribution described by the model. This is a difficult feature to standardize.
Currently, only a limited set of model parameterizations are supported by the convenience functions HydeNetwork.list()
and setNodeModels()
. In situations where more complex model equations are to be specified for certain node(s), setNode()
should be used instead of these functions as it allows more flexibility. Future versions of the package will allow for more flexibility in directly passing R
model objects.
The supported parameterizations include the following:
When a given node as well as all of its parent nodes are categorical (or binary) in nature, the conditional distribution of that node is also fully categorical. We have included two functions  cpt()
and inputCPT()
 which facilitate the process of populating the conditional distributions for such nodes.
Each of these two functions produce an object of class cpt
, which is a kdimensional array (with k equal to the number of parent nodes) with a specific structure: the last dimension corresponds to the child node and the array, when summed across this dimension, is equal to a (k1)dimensional array of ones. It therefore is an array containing conditional distributions of the child node for each combination of parent nodes.
The function cpt()
will compute this array given an input dataset and a formula which represents the conditional probability structure. In the code below, the variable death
is the child node and the variables pe
and treat
are the parent nodes.
h < cpt(death ~ pe + treat, data=PE)
inputCPT()
is similar, although instead of using an input dataset to estimate the conditional distributions, it runs through a dialogue to manually specify the conditional densities. This can be useful for small conditional probability tables, such as the conditional probability of a diagnostic test being positive given disease status:
h < inputCPT(test ~ disease)
 Enter Factor Levels for node 'test': If this is a binary variable, enter '<yn>' as a shortcut. When finished, enter '<z>'. To repeat entry of the last inputted factor level, enter '<b>'. To start over entirely, enter '<s>'  Level 1 of 'test':  Level 2 of 'test': +++ Level 3 of 'test': <z>  Enter Factor Levels for node 'disease': If this is a binary variable, enter '<yn>' as a shortcut. When finished, enter '<z>'. To repeat entry of the last inputted factor level, enter '<b>'. To start over entirely, enter '<s>'  Level 1 of 'disease': Healthy Level 2 of 'disease': Diseased Level 3 of 'disease': <z>  NOTE: parameter 'reduce' is set to TRUE in inputCPT(). Conditional probabilities Pr(test=  disease) will be calculated as the complement of the inputted probabilities Pr(test !=   disease).  Enter the following conditional probabilities: Use '<q>' to halt execution. To go back one step and reenter, enter '<b>'.  Pr(test=+++  Healthy ): 0.23 Pr(test=+++  Diseased): 0.85
print(h)
test disease  +++ Healthy 0.77 0.23 Diseased 0.15 0.85 attr(,"model") disease test wt 1 Healthy +++ 0.23 2 Diseased +++ 0.85 3 Healthy  0.77 4 Diseased  0.15 attr(,"class") [1] "cpt" "array"
In many cases, the user may desire to specify nodes that are nonrandom in nature. For example, we might construct a network for the first roll of dice within a game of craps. In craps, if the "shooter" (the person rolling the dice) rolls a 2, 3, or 12, you immediately lose. If the "shooter" rolls a 7 or 11, you immediately win. Anything else and the "point" gets set (and then the shooter rolls again).
craps < HydeNetwork(~ d1 + d2 + diceSum  d1*d2 + firstRollOutcome  diceSum) craps < setNode(craps, d1, nodeType="dcat", pi = vectorProbs(p = rep(1/6,6), d1), validate = FALSE) craps < setNode(craps, d2, nodeType="dcat", pi = vectorProbs(p = rep(1/6,6), d2), validate = FALSE) craps < setNode(craps, diceSum, nodeType = "determ", define = fromFormula(), nodeFormula = diceSum ~ di1 + di2) craps < setNode(craps, firstRollOutcome, nodeType = "determ", define = fromFormula(), nodeFormula = firstRollOutcome ~ ifelse(diceSum < 4  diceSum > 11, 1, ifelse(diceSum == 7  diceSum == 11, 1,0))) plot(craps)
writeNetworkModel(craps, pretty=TRUE)
The formulas follow the same rules as described above in the [Regression Equations] section.
Nodes and/or links may be added or removed from an existing HydeNetwork
object, using an update
method we have implemented for HydeNetwork
objects. Syntactically, this function acts in a similar fashion to update.lm()
, in that you add or subtract terms from the model equation. Suppose that a new diagnostic test for PE was invented and we wish to incorporate it into the PE network. We can achieve this by the following code:
net2 < update(net, . ~ . + newTest  pe + treat  newTest  pregnant) plot(net2)
The update()
method for HydeNetwork
objects processes terms in the given model equation sequentially. In the above example, the original object net
did not contain a node called newTest
. But there were nodes called pregnant
, pe
, and treat
. The first term within the model equation (+ newTest  pe
) specifies the addition of the node newTest
as a child of node pe
. The second term (+ treat  newTest
) specifies the addition of a link from the nowexisting node newTest
into node treat
. The third term ( pregnant
) specifies the removal of node pregnant
. Examining the network object, two important points are worth mentioning:
net2
First, while the graph has changed  and now node treat
has three parents  the model for node treat
has not changed. The user must specify a new model (with either setNode()
or setNodeModels()
) to account for this new dependency if desired.
Second, a warning message indicates that a parent node (pregnant
) has been removed. Since that node was involved in characterizing the distribution of its child node(s) (d.dimer
), the function by default removes the distribution from all child nodes still existing in the network. The user then is required to use either setNode()
or setNodeModels()
to repopulate the distribution(s) for the affected node(s).
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.