In bnclassify: Learning Discrete Bayesian Network Classifiers from Data

Introduction

\label{sec:introduction} Bayesian network classifiers \citep{Bielza14,Friedman1997} are competitive performance classifiers \citep[e.g.,][]{Zaidi2013} with the added benefit of interpretability. Their simplest member, the naive Bayes (NB) \citep{Minsky1961}, is well-known \citep{Hand2001}. More elaborate models exist, taking advantage of the Bayesian network \citep{Pearl1988,Koller2009} formalism for representing complex probability distributions. The tree augmented naive Bayes \citep{Friedman1997} and the averaged one-dependence estimators (AODE) \citep{Webb2005} are among the most prominent.

A Bayesian network classifier is simply a Bayesian network applied to classification, that is, the prediction of the probability $P(c \mid \mathbf{x})$ of some discrete (class) variable $C$ given some features $\mathbf{X}$. The \CRANpkg{bnlearn} \citep{bnlearn43,scutari2009learning} package already provides state-of-the art algorithms for learning Bayesian networks from data. Yet, learning classifiers is specific, as the implicit goal is to estimate $P(c \mid \mathbf{x})$ rather than the joint probability $P(\mathbf{x}, c)$. Thus, specific search algorithms, network scores, parameter estimation and inference methods have been devised for this setting. In particular, many search algorithms consider a restricted space of structures such as that of augmented naive Bayes \citep{Friedman1997} models. Unlike with general Bayesian networks, it makes sense to omit a feature $X_i$ from the model as long as the estimation of \pcgx/ is no better than that of $P(c\mid \mathbf{x} \setminus x_i)$. Discriminative scores, related to the estimation of \pcgx/ rather than \pcx/, are used to learn both structure \citep{Keogh2002,grossman2004,pernkopf10,carvalho11} and parameters \citep{Zaidi2013,Zaidi2017}. Some of the prominent classifiers \citep{Webb2005} are ensembles of networks, and there are even heuristics applied at inference time, such as the lazy elimination technique \citep{zheng2006efficient}. Many of these methods \citep[e.g.,][]{Dash2002,Zaidi2013,Keogh2002,Pazzani1996} are at best available in standalone implementations published alongside the original papers.

The \CRANpkg{bnclassify} package implements state-of-the-art algorithms for learning structure and parameters. The implementation is efficient enough to allow for time-consuming discriminative scores on relatively large data sets. It provides utility functions for prediction and inference, model evaluation with network scores and cross-validated estimation of predictive performance, and model analysis, such as querying structure type or graph plotting via the \BIOpkg{Rgraphviz} package \citep{Rgraphviz2200}. It integrates with the \CRANpkg{caret} \citep{caret6078,Kuhn2008} and \CRANpkg{mlr} \citep{mlr211,Bischl2015} packages for straightforward use in machine learning pipelines. Currently it supports only discrete variables. The functionalities are illustrated in an introductory vignette, while an additional vignette provides details on the implemented methods. It includes over 200 unit and integration tests that give a code coverage of 94 percent\footnote{See \url{https://codecov.io/github/bmihaljevic/bnclassify?branch=master}}.