In BLNNdevs/BLNN: Bayesian Learning for Artificial Neural Networks

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(BLNN)
library(nnet) #be sure to install if you wish to run the entire RMD
set.seed(2048)

As an example linear modeling, we will use the mtcars dataset from the datasets package to predict mpg using a small number of variables.

Our first aim is to build our BLNN object. We will limit our number of covariates and only use wt and disp, along with three hidden units in the hidden layer. Since we only wish to predict mpg our network only needs a single output.

For our hyperparameter values we initialize them with psudeo random values in that they are arbitrarily selected. These will be re-estimated through the evidence procedure later in training.

LinearNet<-BLNN_Build(ncov=2, nout=1, hlayer_size = 3,
                      actF = "tanh", costF = "MSE", outF = "linear",
                      hp.Err = 10, hp.W1 = .5, hp.W2 = .5,
                      hp.B1 = .5, hp.B2 = .5)

Next we look to organize our data into our covariates and our target values. In most cases it is recomended to scale your data as to avoid network weights that are incredibly large where possible.

data<-cbind(mtcars$wt, mtcars$disp)
data<-scale(data)
targ<-data.matrix(mtcars$mpg)
targ<-scale(targ)

Our next step requires us to train our network. We will be using the popular nnet package to act as our baseline and using our four Bayesian methods to explore their use. Due to the differences between each of our sampling methods it may be necessary to make changes to one or multiple elements inside the control list of each training call.

nnetBasesline<-nnet(data, targ, size=3)
nnetPredictions<-predict(nnetBasesline)

LinearHMC <- BLNN_Train(NET = LinearNet,
                          x = data,
                          y = targ,
                          iter = 10000,
                          chains = 1,
                          algorithm = "HMC",
                          display = 0, control = list(adapt_delta = 0.65,
                                                      Lambda = 0.005,
                                                      stepsize=2,
                                                      gamma=3)
                        )

LinearNUTS <- BLNN_Train(NET = LinearNet,
                          x = data,
                          y = targ,
                          iter = 10000,
                          chains = 1,
                          algorithm = "NUTS",
                          display = 0, control = list(adapt_delta = 0.7,
                                                      lambda=.005,
                                                      stepsize=2,
                                                      gamma=5,
                                                      max_treedepth=20)

                        )

LinearHMCwithEVE <- BLNN_Train(NET = LinearNet,
                          x = data,
                          y = targ,
                          iter = 10000,
                          chains = 1,
                          algorithm = "HMC",
                          evidence = TRUE,
                          display = 0, control = list(adapt_delta = 0.65,
                                                      Lambda = 0.005,
                                                      stepsize=2,
                                                      gamma=12)
                        )

LinearNUTSwithEVE <- BLNN_Train(NET = LinearNet,
                          x = data,
                          y = targ,
                          iter = 10000,
                          chains = 1,
                          algorithm = "NUTS",
                          evidence = TRUE,
                          display = 0, control = list(adapt_delta = 0.99,
                                                      stepsize=5,
                                                      gamma=7,
                                                      max_treedepth=20,
                                                      adapt_mass=FALSE)

                        )

After we confirm that our samples had an appropriate acceptance ratio and have, in the very least, low values for Rhat (less than one) and larger values for effective sample size (minimum 50 each) we can update each of our networks with the newly sampled parameters.

LinearHMC<-BLNN_Update(LinearNet, LinearHMC)
LinearNUTS<-BLNN_Update(LinearNet, LinearNUTS)
LinearHMCwithEVE<-BLNN_Update(LinearNet, LinearHMCwithEVE)
LinearNUTSwithEVE<-BLNN_Update(LinearNet, LinearNUTSwithEVE)

Once we have updated our networks with the appropriate weights, and in the case of evidence procedure the updated hyper parameters, we can gather our predictions and examine the overall error.

HMCpred<-BLNN_Predict(LinearHMC, data, targ)
NUTSpred<-BLNN_Predict(LinearNUTS, data, targ)
HMCpredEVE<-BLNN_Predict(LinearHMCwithEVE, data, targ)
NUTSpredEVE<-BLNN_Predict(LinearNUTSwithEVE, data, targ)

With the predictions for each method we can organize the network errors and sum of the absolute difference in predicted values.

errs<-c(HMCpred$Errors$Total,NUTSpred$Errors$Total, HMCpredEVE$Errors$Total, NUTSpredEVE$Errors$Total, nnetBasesline$value)

abdiff<-c(sum(abs(HMCpred$Difference)), sum(abs(NUTSpred$Difference)), sum(abs(HMCpredEVE$Difference)), sum(abs(NUTSpredEVE$Difference)), sum(abs(targ-nnetPredictions)))

OutTab<-data.frame(errs, abdiff)

rownames(OutTab)<-c("HMC", "NUTS", "EVEHMC", "EVENUTS", "NNET")

View(OutTab)