knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.align = 'center' )
Consider the asia
dataset available from bnlearn
. Details of the variables can be found in the bnlearn
documentation.
library(bnlearn) data(asia) summary(asia)
Consider two candidate BN models (Lauritzen and Spiegelhalter, 1988), which only differ in the edge from S
to D
.
``` {r out.width="50%"}
asia_bn <- bnlearn::model2network("[A][S][T|A][L|S][B|S][D|B:E][E|T:L][X|E]")
bnlearn::graphviz.plot(asia_bn)
asia_bn_alt <- bnlearn::model2network("[A][S][T|A][L|S][B|S][E|T:L][X|E][D|B:E:S]")
bnlearn::graphviz.plot(asia_bn_alt)
### Global monitor <!-- The global monitor is equivalent to the Bayes Factor and offers a good assessment of the model as a whole. --> <!-- It is primarily useful for differentiating between candidate models as the global monitor pinpoints the nodes with different contributions for two candidate models. --> <!-- We can assess the log likelihood contribution from each of the nodes and determine which nodes contribute the most to the global monitor. --> <!-- global.monitor.graph provides a quick visual for the entire network. --> <!-- The darker the color, the more substantial the contribution of that node to the log likelihood. --> <!-- The global monitor assesses the probability of observing the data given the model and as such does not particularly fit the prequential view. --> <!-- However, it is a quick and useful first diagnostic assessment. --> A first useful diagnostic is the `global_monitor`, reporting the contribution of each vertex to the log-likelihood of the model. ``` {r} library(bnmonitor) asia <- asia[sample(1:nrow(asia),50),] glob_asia <- node_monitor(dag = asia_bn, df = asia, alpha = 3) glob_asia_alt <- node_monitor(dag = asia_bn_alt, df = asia, alpha = 3) glob_asia glob_asia_alt
In the alternative model, Dysnopea contributes more to the log-likelihood.
Global monitors can be plotted giving a quick view of the decomposition of the log-likelihood. The darker the color, the more substantial the contribution of a vertex.
plot(glob_asia)
There are two variants of node monitors.
The marginal node monitor computes the probability of the $i$th observation in the data set in turn after passing the evidence of the $i-1$th cases in the data set.
The conditional node monitor computes the probability of the $i$th observation in the data set after passing evidence of the $i-1$th cases in the data set, and the evidence for all nodes in the $i$th case except the node of interest.
As a quick survey of the nodes, the node_monitor
command computes the marginal and conditional monitors for the final observation in the data set.
``` {r echo=TRUE, message = FALSE, warning = FALSE} node_asia <- final_node_monitor(dag = asia_bn, df = asia) node_asia
The scores indicate a poor fit of the probability distributions specified for the Smoking, Bronchitis, and Dysnopea nodes, since these are larger than 1.96 in absolute value. Plots can also be created to give a visual counterpart of the node monitors. ```r plot(node_asia, which = "marginal")
As an illustration we investigate further the fit of the variable Dysnopea.
The sequential marginal monitor seq_marg_monitor
gives us a closer look at which particular forecasts in the data set might cause this poor fit. We examine the sequential monitor for both candidate models.
``` {r warning=FALSE} seq_asia <- seq_marg_monitor(dag = asia_bn, df = asia, node.name = "D") seq_asia_alt <- seq_marg_monitor(dag = asia_bn_alt, df = asia, node.name = "D") seq_asia seq_asia_alt
```r plot(seq_asia) plot(seq_asia_alt)
Both monitors indicate that for some observations there is a poor fit (score above 1.96 in absolute value). In particular for the alternative models the marginal monitor has larger values in absolute value.
A similar analysis can be conducted with seq_marg_monitor
, which would show that the model fits well (not reported here).
Once a vertex has been identified as a poor fit, further investigation can be carried out to check for which values of the parents the model provides a bad representation. This can be achieved with the seq_pa_ch_monitor
function.
As an illustration consider the asia_bn
BN, the vertex D
(Dysnopea), the parent variable B
(Bronchitis) which can take values yes
and no
.
``` {r warning=FALSE, out.width="50%"} asia_pa_ch1 <- seq_pa_ch_monitor(dag = asia_bn, df = asia, node.name = "D", pa.names = "B", pa.val = "yes", alpha = 3) asia_pa_ch1 asia_pa_ch2 <- seq_pa_ch_monitor(dag = asia_bn, df = asia, node.name = "D", pa.names = "B", pa.val = "no", alpha = 3) asia_pa_ch2 plot(asia_pa_ch1) plot(asia_pa_ch2)
For this model, Dysnopea is adequately modeled for both values of Bronchitis, since most scores largely fall in the recommended range. ### Influential observations The last robustness tool is the absolute value of the log-likelihood ratio between a model learnt without one observation and the one learnt with the full dataset. Larger values are associated to atomic events which influence the structural learning. ```r influence <- influential_obs(dag = asia_bn, data = asia, alpha = 3) head(influence) plot(influence)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.