naive_bayes: Naive Bayes
In stschn/deepANN: Neural Network Toolbox

naive_bayes

R Documentation

Naive Bayes

Description

Naive Bayes

Usage

naive_bayes(object, ...)

## S3 method for class 'formula'
naive_bayes(formula, data, ...)

## Default S3 method:
naive_bayes(x, y, laplace = 0, FUN, ...)

is.naivebayes(object)

Arguments

`object`	R object.
`...`	Optional arguments.
`formula`	A model `formula`.
`data`	A data frame, containing the variables in `formula`. Neither a matrix nor an array will be accepted.
`x`	A matrix or data frame with feature values.
`y`	A factor variable with categorical values for `x`.
`laplace`	A value for Laplace smoothing to avoid zero probability problem, default `0` is equal to no smoothing.

Details

The Naive Bayes model is based on Bayes' theorem: P(A|B) = P(B|A) * P(A) / P(B)
Adopted to a classification problem, the equation is: P(y=k|X) = P(X|y=k) * P(y=k) / P(X), whereby

P(y=k|X) is the conditional probability of y=k given a feature set X. This probability is also called posterior probability.
P(X|y=k) is the conditional probability of X given a specific category k of y. This probability is also called the probability of likelihood of evidence.
P(y=k) is the probability that y takes the value k. This probability is also called the prior probability.
P(X) is the probability that features X have the given values. This probability is also called the probability of evidence. This probability is constant for every value of y, and therefore it will not affect the posterior probabilities. For reasons of simplification, the probability of evidence will be ignored in computation. The result without probability of evidence is no longer strictly a probability. The calculated largest value is used for class prediction.

Value

A list from class naivebayes with levels and prior probabilities of y and names and likelihood distribution parameters of x categorized by the levels of factor y.

Examples

  # Continuous features
  df <- data.frame(y = c(0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L),
                   x1 = c(3.393533211, 3.110073483, 1.343808831, 3.582294042, 2.280362439, 7.423436942, 5.745051997, 9.172168622, 7.792783481, 7.939820817),
                   x2 = c(2.331273381, 1.781539638, 3.368360954, 4.67917911, 2.866990263, 4.696522875, 3.533989803, 2.511101045, 3.424088941, 0.791637231))

  # Categorical features
  fruit_type <- c("Banana", "Orange", "Other")
  # Banana
  Long <- (v <- c(rep(1, 400), rep(0, 100)))[sample(length(v))]
  Sweet <- (v <- c(rep(1, 350), rep(0, 150)))[sample(length(v))]
  Yellow <-  (v <- c(rep(1, 450), rep(0, 50)))[sample(length(v))]
  fruit <- data.frame(Type = fruit_type[1L], Long, Sweet, Yellow)
  # Orange
  Type <- rep(fruit_type[2L], 300)
  Long <- (v <- c(rep(1, 0), rep(0, 300)))[sample(length(v))]
  Sweet <- (v <- c(rep(1, 150), rep(0, 150)))[sample(length(v))]
  Yellow <-  (v <- c(rep(1, 300), rep(0, 0)))[sample(length(v))]
  fruit <- rbind.data.frame(fruit, cbind.data.frame(Type, Long, Sweet, Yellow))
  # Other
  Type <- rep(fruit_type[3L], 200)
  Long <- (v <- c(rep(1, 100), rep(0, 100)))[sample(length(v))]
  Sweet <- (v <- c(rep(1, 150), rep(0, 50)))[sample(length(v))]
  Yellow <-  (v <- c(rep(1, 50), rep(0, 150)))[sample(length(v))]
  fruit <- rbind.data.frame(fruit, cbind.data.frame(Type, Long, Sweet, Yellow))
  fruit <- fruit[sample(NROW(fruit)), ]
  rownames(fruit) <- seq_len(NROW(fruit))
  to_factor <- c("Type", "Long", "Sweet", "Yellow")
  fruit[to_factor] <- lapply(fruit[to_factor], as.factor)
  df <- fruit
  rm(Long, Sweet, Yellow, Type, v, to_factor, fruit_type, fruit)

  x <- df[, -1L]
  y <- as.factor(df[[1L]])
  nb <- naive_bayes(as.formula(y ~ .), data = df) # change y to Type for second example
  yposterior <- predict(nb, x)
  yhat <- levels(y)[apply(yposterior, 1L, which.max)]
  deepANN::accuracy(y, yhat)

stschn/deepANN documentation built on June 25, 2024, 7:27 a.m.