Usage example

\label{sec:usage}

knitr::opts_chunk$set(cache = FALSE, autodep = TRUE, collapse = TRUE, comment = "", highlight = FALSE
                      # , fig.height = 3, fig.width = 3, 
                      )

The available functionalities can be split into four groups:

  1. Learning network structure and parameters
  2. Analyzing the model
  3. Evaluating the model
  4. Predicting with the model

\noindent We illustrate these functionalities with the synthetic \code{car} data set with six features. We begin with a simple example for each functionality group and then elaborate on the options in the following sections. We first load the package and the dataset,

library(bnclassify)
data(car)

\noindent then learn a naive Bayes structure and its parameters,

nb <- nb('class', car) 
nb <- lp(nb, car, smooth = 0.01) 

\noindent get the number of arcs in the network,

narcs(nb)

\noindent get the 10-fold cross-validation estimate of accuracy,

cv(nb, car, k = 10) 

\noindent and finally classify the entire data set

p <- predict(nb, car)
head(p) 
suppressWarnings(RNGversion("3.5.0"))
set.seed(1)

Learning

The functions for structure learning, shown in \rtbl{algorithms}, correspond to the different algorithms. They all receive the name of the class variable and the data set as their first two arguments, which are then followed by optional arguments. The following runs the CL-ODE algorithm with the AIC score, followed by the FSSJ algorithm to learn another model:

ode_cl_aic <- tan_cl('class', car, score = 'aic')   
suppressWarnings(RNGversion("3.5.0"))
set.seed(3)
fssj <- fssj('class', car, k = 5, epsilon = 0)

The \code{bnc()} function is a shorthand for learning structure and parameters in a single step,

ode_cl_aic <- bnc('tan_cl', 'class', car, smooth = 1, dag_args = list(score = 'aic'))

\noindent where the first argument is the name of the structure learning function while and optional arguments go in \code{dag_args}.

Analyzing

Printing the model, such as the above \code{ode_cl_aic} object, provides basic information about it.

ode_cl_aic

While plotting the network is especially useful for small networks, printing the structure in the \CRANpkg{deal} \citep{deal1237} and \pkg{bnlearn} format may be more useful for larger ones:

ms <- modelstring(ode_cl_aic)
strwrap(ms, width = 60)

We can query the type of structure; \code{params()} lets us access the conditional probability tables (CPTs); while \code{features()} lists the features:

is_ode(ode_cl_aic)
params(nb)$buying
length(features(fssj))

For example, \code{fssj()} has selected five out of six features.

\code{manb_arc_posterior()} provides the MANB posterior probabilities for arcs from the class to each of the features:

manb <- lp(nb, car, smooth = 0.01, manb_prior = 0.5)
round(manb_arc_posterior(manb))

With the posterior probability of 0% for the arc from \code{class} to \code{doors}, and 100\% for all others, MANB renders \code{doors} independent from the class while leaving the other features' parameters unaltered. We can see this by printing out the CPTs:

params(manb)$doors 
all.equal(params(manb)$buying, params(nb)$buying)

For more functions for querying a structure with parameters (\code{"bnc_bn"}) see \code{?inspect_bnc_bn}. For a structure without parameters (\code{"bnc_dag"}), see \code{?inspect_bnc_dag}.

Evaluating

Several scores can be computed:

logLik(ode_cl_aic, car)
AIC(ode_cl_aic, car)

The \code{cv} function estimates the predictive accuracy of one or more models with a single run of stratified cross-validation. In the following we assess the above models produced by NB and CL-ODE algorithms:

suppressWarnings(RNGversion("3.5.0"))
set.seed(0)
cv(list(nb = nb, ode_cl_aic = ode_cl_aic), car, k = 5, dag = TRUE)

\noindent Above, \code{k} is the desired number of folds, and \code{dag = TRUE} evaluates structure and parameter learning, while \code{dag = FALSE} keeps the structure fixed and evaluates just the parameter learning. The output gives 86\% and 93\% accuracy estimates for NB and CL-ODE, respectively. \CRANpkg{mlr} and \CRANpkg{caret} packages provide additional options for evaluating predictive performance, such as different metrics, and \pkg{bnclassify} is integrated with both (see the \invin/).

Predicting

As shown above, we can predict class labels with \code{predict()}. We can also get the class posterior probabilities:

pp <- predict(nb, car, prob = TRUE)
# Show class posterior distributions for the first six instances of car
head(pp)


bmihaljevic/bnclassify documentation built on March 18, 2024, 8:34 a.m.