In dkesada/dbnR: Dynamic Bayesian Network Learning and Inference

knitr::opts_chunk$set(echo = TRUE)

Introduction

This is a full example of the basic usage of the package, from learning the DBN network structure to forecasting.

Example dataset

We will use the sample of the 'motor' dataset included in the package for the example, consisting of 3000 instances and 12 variables. The training times vary greatly depending on the size of the networks trained and the size of the dataset. This dataset is only intended to perform some tests with the package and show the functionallity, it is expected that we will model it poorly, given that this is only a minimal part of the original dataset. The original dataset can be found at https://www.kaggle.com/wkirgsn/electric-motor-temperature.

library(dbnR)
library(data.table)
data(motor)
summary(motor)

Net stucture learning

To learn the structure of a DBN, first we have to fix its size and set some parameters for the learning algorithm. If the 'dmmhc' method is selected, further parameters can be given to the algorithm that will be passed down to the rsmax2 function from the bnlearn package. Three specially useful parameters are 'maxp' inside 'maximize.args', which sets the maximum number of parents that a node can have, 'blacklist', which allows us to forbid arcs in the static structure of the network and 'whitelist', which allows us to force specific arcs in the static structure of the network. Two additional parameters, 'blacklist_tr' and 'whitelist_tr' allow us to forbid or force specific arcs to the transition structure of the inter-slice arcs.

size = 3
dt_train <- motor[1:2800]
dt_val <- motor[2801:3000]
blacklist <- c("motor_speed_t_0", "i_d_t_0")
blacklist <- rbind(blacklist, c("motor_speed_t_0", "i_q_t_0"))
blacklist_tr <- c("stator_tooth_t_1", "coolant_t_0")
blacklist_tr <- rbind(blacklist_tr, c("stator_tooth_t_2", "coolant_t_1"))
whitelist_tr <- c("coolant_t_2", "stator_yoke_t_0")
net <- dbnR::learn_dbn_struc(dt_train, size, method = "dmmhc", blacklist = blacklist,
                             blacklist_tr = blacklist_tr, whitelist_tr = whitelist_tr,
                             restrict = "mmpc", maximize = "hc",
                             restrict.args = list(test = "cor"),
                             maximize.args = list(score = "bic-g", maxp = 10))

#net <- dbnR::learn_dbn_struc(dt_train, size, method = "psoho")
#net <- dbnR::learn_dbn_struc(dt_train, size, method = "natPsoho")

The 'blacklist' parameter is a 2 column matrix where the first column is the 'from' and the second is the 'to'. It's important to append the "_t_0" to the names of the nodes, because they are renamed inside the algorithm to that format. A 'whitelist' parameter can also be provided to force some arcs to be present with the same format as the blacklist, but it can sometimes increase the execution time greatly. Only include arcs in t_0 in both the blacklist and whitelist, given that this parameter specifies them for the static network learning, not the transition one. For intra-slice arcs of the transition network, use the 'blacklist_tr' parameter. It has the same syntax as the other blacklist, but here you can exclude arcs that go from one time slice to another. Note that the blacklist and whitelist parameters are only available in the dmmhc algorithm, not in the others.

Another interesting parameter is 'f_dt'. It allows the user to provide an already folded dataset, mainly so that some instances can be removed. We can also avoid learning intra-slice arcs with the 'intra' parameter: if TRUE, intra-slice arcs will be learnt as normal, but if it is set to FALSE we can skip the learning of the static structure in the DBN.

The particle swarm optimization for higher-order DBNs (psoho) algorithm can also be used to learn the DBN structure. Parameters such as the number of particles and the number of iterations can be tuned. It returns a specific type of structures where no intra-slice arcs are present and the only arcs allowed are those that end in t_0. With the default parameters, it takes quite longer than the dmmhc algorithm to learn the structure on low Markovian order networks, but the results are sometimes more interesting. On high order networks, the psoho algorithm scales much better than the dmmhc.

The optimized order-invariant natPsoho algorithm is also available. Like the psoho algorithm, it employs particle swarm optimization to search over the space of possible transition networks, but this algorithm is faster, finds better solutions and scales better with size due to its encoding.

Parameter learning

Once we have the structure of the DBN, we have to fold our dataset and fit the parameters of the DBN:

f_dt_train <- fold_dt(dt_train, size)
f_dt_val <- fold_dt(dt_val, size)
fit <- dbnR::fit_dbn_params(net, f_dt_train, method = "mle-g")

Visualization

We have implemented a visualization tool for the DBN structures. It will plot an interactive html graph via the 'visNetwork' package. Each time slice is displayed in a different color and the nodes can be selected and moved around to see the local structures better. Both the network and the fit objects can be plotted.

dbnR::plot_dynamic_network(fit)

Inference and forecasting

After fitting the model, we can perform inference over it. One of the particularities of BNs is that any variable can be the objective of the inference, so we have to decide which variables are going to be the objective ones. There are two types of inference available: point-wise inference and forecasting. Point-wise inference will give a prediction for the objective variables in each row of the folded dataset. Forecasting will give the predicted values of the objective variable over a span of time from a starting point.

obj_var <- c("stator_winding_t_0")
res <- dbnR::predict_dt(fit, f_dt_val, obj_var)

One should be careful when performing point-wise prediction on time-series models, because a very good prediction of the future time step is the previous value. This means that sometimes a time-series model can be just passing forward the values of the variables from the previous step, resulting in very good accuracy when the forecasting horizon is 1 instant, but performing very poorly as the horizon increases. I've seen this behaviour with multiple time-series models, so one should be weary of this. In our example, the predictions look deceptively good, but in reality the model does not forecast properly because we extracted only a minimal part of the original dataset for testing purposes, not forecasting accuracy.

Next, we will forecast some of the variables in the validation dataset:

res_fore <- suppressWarnings(dbnR::forecast_ts(f_dt_val, fit, obj_vars = c("pm_t_0", "stator_winding_t_0"), ini = 100, len = 70))
set.seed(42)
res_fore <- suppressWarnings(dbnR::forecast_ts(f_dt_val, fit, obj_vars = c("pm_t_0"), ini = 100, len = 70, mode = "approx", num_p = 150, rep = 3))

With DBNs, all variables in the next time step are predicted from the values in the previous slices. If no evidence is provided, all the variables in t_0 will be forecasted and moved to t_1 in the next prediction step. The 'obj_var' parameter is used to return the forecasting results of the variables we are interested in. The forecasting length and starting point can be modified as showed in the example.

There are two forecasting modes: exact and approximate inference. Exact inference is faster and more reliable than the approximate particle inference in the scenario with only continuous variables, so it should be prioritized. It is also important to note that the exact inference shows the mean values of the nodes in each predicted step, and it is expected some variance that is not shown in the plot. The predictions are so smooth because of this, and if the approximate inference is used this variance can be seen by running the inference several times in a row. It remains as future work to show the expected variance of the forecasting in the plot.

We can also perform evidenced forecasting with our models. We call 'evidenced forecasting' to the special case when we already know the values that some variables are going to take in the future, and so we set those values and predict the rest. This is very useful if we want our models to serve as simulators, fixing some variables to certain values to see the effect that those have on the forecasting.

res_fore <- suppressWarnings(dbnR::forecast_ts(f_dt_val, fit, obj_vars = c("pm_t_0"), ini = 100, len = 70, prov_ev = c("stator_winding_t_0", "u_q_t_0")))

Several variables can be provided as evidence. In each prediction step, the values in t_0 of these variables will be fixed with the ones provided in f_dt_val. If we want to set an intervention and see its effect, we can do it as follows:

inter_dt <- copy(f_dt_val)
inter_dt[, stator_winding_t_0 := 0.32]
res_fore <- suppressWarnings(dbnR::forecast_ts(inter_dt, fit, obj_vars = c("pm_t_0"), ini = 100, len = 70, prov_ev = c("stator_winding_t_0", "u_q_t_0")))

There is no need to change 'stator_winding_t_1' and 'stator_winding_t_2', because on the next steps they will take the values fixed for t_0. This way, we can intervene multiple variables in the way we want and see the effects that it will have. The values are fixed in each time step, so we can make the interventions gradual increases, decreases or whatever we need them to do. As a simulation tool, this is very powerful. We have made some prototypes with shiny and in the future we intend to provide the capability to automatically generate a visual simulation interface with it.

Smoothing

Another type of inference that can be done with dynamic Bayesian networks is smoothing. In this scenario, we have a point in time and we want to see how did we get to that point in previous instants of time. It is similar to forecasting, but in this case we do it backwards in time. With this, we can try and find a possible explanation for the values we have in previous instants of time, for example. Smoothing is a very common operation, but not as used as forecasting.

To perform smoothing, we follow almost the same procedure as to perform forecasting with the network, but in this case all our objective variables are in the oldest time slice, not in t_0. Take care with the 'ini' and 'len' parameters, because we are now going backwards. Should you start smoothing at ini = 50 and len = 20, you would end up predicting up to the instance 30, for example.

res_smooth <- suppressWarnings(dbnR::smooth_ts(f_dt_val, fit, obj_vars = c("stator_winding_t_2"), ini = 70, len = 20))

In this case, only exact inference is implemented. Evidenced smoothing is also supported, and it works in the same way as in the forecasting case. Again, take care to provide the evidenced named accordingly to the oldest time slice, depending on the size that was used for training the model.

res_smooth <- suppressWarnings(dbnR::smooth_ts(f_dt_val, fit, obj_vars = c("stator_winding_t_2"), ini = 70, len = 20,
                                               prov_ev = c("stator_tooth_t_2")))

Some other functionality

Some functions of the package remain exported in case they can be useful elsewhere. Some examples of this functions are 'fold_dt', 'calc_mu' or 'calc_sigma. These last two are useful if you want to stick to the multivariate Gaussian representation of the network instead of the graph structure. The following example learns a network with some Gaussian noise added to the variables and prints the determinant of the sigma matrix:

add_gauss_noise <- function(x, mu = NULL, sd = NULL){
  n <- length(x)
  if(is.null(mu))
    mu <- mean(x)
  if(is.null(sd))
    sd <- sd(x)

  noise <- rnorm(n, mu, sd)

  return(x + noise)
}

noisy_dt_train <- as.data.table(dt_train[, sapply(.SD,add_gauss_noise, mu= 0, sd = 2)])
net <- dbnR::learn_dbn_struc(noisy_dt_train, size, restrict = "mmpc", maximize = "hc",
                             restrict.args = list(test = "mi-g"),
                             maximize.args = list(score = "bge", maxp = 8))
noisy_f_dt_train <- fold_dt(dt_train, size)
fit <- dbnR::fit_dbn_params(net, noisy_f_dt_train, method = "mle-g")
print(paste0("The determinant of the noisy data is: ", det(calc_sigma(fit))))