In majkamichal/naivebayes: High Performance Implementation of the Naive Bayes Algorithm

1) Introduction

The naivebayes package offers a comprehensive set of functions that implement specialized versions of the Naïve Bayes classifier. In this article, we explore these functions and provide a detailed overview of their basic usage.

The package provides five key functions that cater to different types of data:

bernoulli_naive_bayes(): Ideal for binary data, this function is specifically tailored to handle situations where the features are binary in nature, taking values of either 0 or 1. It effectively models the relationship between these binary features and the class labels.
multinomial_naive_bayes(): This function is specifically designed for multinomial data. It is well-suited for cases where the features are discrete and have multiple categories. For example, it is commonly used in text classification tasks, where word counts are utilized as features.
poisson_naive_bayes(): This function is specifically designed for count data. It is well-suited for scenarios where the features represent counts or frequencies. By assuming a Poisson distribution for the features, this function effectively models the relationship between the counts and the class labels.
gaussian_naive_bayes(): Suitable for continuous data, this function assumes that the features follow a Gaussian (normal) distribution. It is commonly used when dealing with continuous numerical features, and it models the relationship between these features and the class labels.
nonparametric_naive_bayes() This function introduces a nonparametric variant of the Naïve Bayes classifier. Unlike the specialized versions mentioned above, it does not make any assumptions regarding the specific distribution of the continuous features. Instead, it employs kernel density estimation to estimate the class conditional probabilities and feature distributions in a nonparametric manner. This allows for more flexible modeling and can be particularly useful when dealing with data that deviates from typical parametric assumptions.

Throughout this article, we will dive into each of these functions providing illustrative examples to demonstrate their practical usage. By the end, you will have a solid understanding of how to leverage these specialized versions of the Naïve Bayes classifier to tackle various types of data and classification problems.

We will also show equivalent calculations using the general naive_bayes() function, allowing you to compare and understand the differences between the specialized and general approaches.

2) Bernoulli Naive Bayes

The Bernoulli Naive Bayes classifier is suitable for binary data. To demonstrate its usage, we start by simulating a dataset with binary features. We then train the Bernoulli Naive Bayes model using the bernoulli_naive_bayes() function.

Simulate data:

library(naivebayes)
cols <- 10 ; rows <- 100 ; probs <- c("0" = 0.4, "1" = 0.1)
M <- matrix(sample(0:1, rows * cols, TRUE, probs), nrow = rows, ncol = cols)
y <- factor(sample(paste0("class", LETTERS[1:2]), rows, TRUE, prob = c(0.3, 0.7)))
colnames(M) <- paste0("V", seq_len(ncol(M)))

Train the Bernoulli Naive Bayes model:

laplace <- 0.5
bnb <- bernoulli_naive_bayes(x = M, y = y, laplace = laplace) # M has to be a matrix
summary(bnb)
head(predict(bnb, newdata = M, type = "prob")) 

# Equivalently
head(bnb %prob% M)

# Visualise marginal distributions
plot(bnb, which = "V1", prob = "marginal")

# Obtain model coefficients
coef(bnb)

Equivalent calculation with naive_bayes function:

# It is made sure that the columns are factors with the 0-1 levels)
df <- as.data.frame(lapply(as.data.frame(M), factor, levels = c(0, 1)))
# sapply(df, class)
nb <- naive_bayes(df, y, laplace = laplace)
head(nb %prob% df)

3) Multinomial Naive Bayes

Next, we explore the Multinomial Naive Bayes classifier. We simulate a dataset with multinomial features and train the model using the multinomial_naive_bayes() function. We discuss the summary of the model, perform classification, compute posterior probabilities, and examine parameter estimates. Note that this specialized model is not available in the general naive_bayes() function.

Simulate data:

set.seed(1)
cols <- 3 # words
rows <- 10000 # all documents
rows_spam <- 100 # spam documents

word_prob_non_spam <- prop.table(runif(cols))
word_prob_spam <- prop.table(runif(cols))

M1 <- t(rmultinom(rows_spam, size = cols, prob = word_prob_spam))
M2 <- t(rmultinom(rows - rows_spam, size = cols, prob = word_prob_non_spam))
M <- rbind(M1, M2)
colnames(M) <- paste0("word", 1:cols) ; rownames(M) <- paste0("doc", 1:rows)
head(M)
y <- c(rep("spam", rows_spam), rep("non-spam", rows - rows_spam))

Train the Multinomial Naive Bayes:

laplace <- 0.5
mnb <- multinomial_naive_bayes(x = M, y = y, laplace = laplace)
summary(mnb)

# Classification
head(predict(mnb, M)) # head(mnb %class% M)

# Posterior probabilities
head(predict(mnb, M, type = "prob")) # head(mnb %prob% M)

# Parameter estimates
coef(mnb)

# Compare estimates to the true probabilities
round(cbind(non_spam = word_prob_non_spam, spam = word_prob_spam), 4)

4) Poisson Naive Bayes

The Poisson Naive Bayes classifier is specifically designed for count data. We simulate count data and train the model using the poisson_naive_bayes() function. We analyze the summary of the model, perform prediction, visualize marginal distributions, and obtain model coefficients.

Simulate data:

cols <- 10 ; rows <- 100
M <- matrix(rpois(rows * cols, lambda = 3), nrow = rows, ncol = cols)
# is.integer(M) # [1] TRUE
y <- factor(sample(paste0("class", LETTERS[1:2]), rows, TRUE))
colnames(M) <- paste0("V", seq_len(ncol(M)))

Train the Poisson Naive Bayes:

laplace <- 0
pnb <- poisson_naive_bayes(x = M, y = y, laplace = laplace)
summary(pnb)
head(predict(pnb, newdata = M, type = "prob"))

# Visualise marginal distributions
plot(pnb, which = "V1", prob = "marginal")

# Obtain model coefficients
coef(pnb)

Equivalent calculation with naive_bayes function:

nb2 <- naive_bayes(M, y, usepoisson = TRUE, laplace = laplace)
head(predict(nb2, type = "prob"))

5) Gaussian Naive Bayes

The Gaussian Naive Bayes classifier is discussed next. We use the famous Iris^[https://en.wikipedia.org/wiki/Iris_flower_data_set] dataset and train the Gaussian Naive Bayes model using the gaussian_naive_bayes() function. We summarize the model, visualize class conditional distributions, and obtain parameter estimates. Data:

data(iris)
y <- iris[[5]]
M <- as.matrix(iris[-5])

Train the Gaussian Naive Bayes:

gnb <- gaussian_naive_bayes(x = M, y = y)
summary(gnb)
head(predict(gnb, newdata = M, type = "prob"))

# Visualise class conditional distributions
plot(gnb, which = "Sepal.Width", prob = "conditional")

# Obtain parameter estimates
coef(gnb)

coef(gnb)[c(TRUE,FALSE)] # Only means

Equivalent calculation with general naive_bayes function:

nb3 <- naive_bayes(M, y)
head(predict(nb3, newdata = M, type = "prob"))

6) Non-Parametric Naive Bayes

Lastly, we explore the Non-Parametric Naive Bayes classifier. Again, using the Iris^[https://en.wikipedia.org/wiki/Iris_flower_data_set] dataset, we train the model using the nonparametric_naive_bayes() function. We will visualize the class conditional distributions, perform classification, so that it can be compared to the Gaussian Naive Bayes demonstrated in a previous section.

Data:

data(iris)
y <- iris[[5]]
M <- as.matrix(iris[-5])

Train the Non-Parametric Naive Bayes:

nnb <- nonparametric_naive_bayes(x = M, y = y)
# summary(nnb)
head(predict(nnb, newdata = M, type = "prob"))

plot(nnb, which = "Sepal.Width", prob = "conditional")

Equivalent calculation with general naive_bayes function:

nb4 <- naive_bayes(M, y, usekernel = TRUE)
head(predict(nb4, newdata = M, type = "prob"))

majkamichal/naivebayes documentation built on March 26, 2024, 8:44 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

majkamichal/naivebayes
High Performance Implementation of the Naive Bayes Algorithm

In majkamichal/naivebayes: High Performance Implementation of the Naive Bayes Algorithm

1) Introduction

2) Bernoulli Naive Bayes

3) Multinomial Naive Bayes

4) Poisson Naive Bayes

5) Gaussian Naive Bayes

6) Non-Parametric Naive Bayes

R Package Documentation

Browse R Packages

We want your feedback!

majkamichal/naivebayes High Performance Implementation of the Naive Bayes Algorithm

In majkamichal/naivebayes: High Performance Implementation of the Naive Bayes Algorithm

1) Introduction

2) Bernoulli Naive Bayes

3) Multinomial Naive Bayes

4) Poisson Naive Bayes

5) Gaussian Naive Bayes

6) Non-Parametric Naive Bayes

R Package Documentation

Browse R Packages

We want your feedback!

majkamichal/naivebayes
High Performance Implementation of the Naive Bayes Algorithm