This package provides an online classification method based on Naive Bayes, that is able to handle concept drift. Furthermore, it comes with extended Naive Bayes functions, that can be printed, plotted, predicted and updated (see below). The same holds for the NBCD method.
devtools::install_github("aschersleben/NBCD", build_vignettes = TRUE)
library("NBCD")
We use the well-known iris
dataset and add a "concept drift":
set.seed(1234)
iris2 <- iris[sample(150), ]
iris2$Sepal.Width <- iris2$Sepal.Width + seq(1, 30, len = 150) # <- adding a "Concept Drift"
model <- makeNBCDmodel(list(x = iris2[1:120, 1:4], class = iris2[1:120, 5], time = 1:120), model = NULL,
discretize = "fixed", discParams = list(Sepal.Length = 4:8),
init.obs = 20, max.waiting.time = 20, waiting.time = "auto")
print(model)
For plotting, the NBCD package uses ggplot2
.
plot(model, ylim = c(25, 35))
plot(model, ylim = c(25, 35), use.lm = TRUE, time = 150)
You can directly add data to the plots, predictions are included automatically:
plot(model, ylim = c(20, 40), use.lm = FALSE,
data = iris2[140:150, ], class.name = "Species")
plot(model, ylim = c(20, 40), use.lm = TRUE, time = 150,
data = iris2[140:150, ], class.name = "Species")
See vignette via
vignette("NBCD")
This package also includes nb2()
, an extended version of naiveBayes()
from e1071
.
It can be updated with new observations and includes an automated
discretization.
At the first look, there is no difference to the e1071
function:
mod <- nb2(iris[, 1:4], iris[, 5])
print(mod)
But you can not only print but also plot the model:
plot(mod)
plot(mod, data = iris, class.name = "Species")
Easy discretization (= specifying limits for the categories):
discParam <- list(Sepal.L = 4:8, Sepal.W = 1:5)
mod2 <- nb2(iris[, 1:4], iris[, 5], discretize = "fixed", discParams = discParam)
print(mod2)
plot(mod2, data = iris, class.name = "Species")
Easy updates (= adding new observations to the model without re-computing):
mod.upd <- update(mod, newdata = iris[1:50, 1:4], y = iris$Species[1:50])
print(mod.upd)
Easy updates for discretized variables (= no previous, manual discretization necessary):
mod2.upd <- update(mod2, newdata = iris[51:100, 1:4], y = iris$Species[51:100])
print(mod2.upd)
Read about concept drift in Webb et al. (2016, DOI:10.1007/s10618-015-0448-4).
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.