naive_bayes | R Documentation |
Naive Bayes
naive_bayes(object, ...)
## S3 method for class 'formula'
naive_bayes(formula, data, ...)
## Default S3 method:
naive_bayes(x, y, laplace = 0, FUN, ...)
is.naivebayes(object)
object |
R object. |
... |
Optional arguments. |
formula |
A model |
data |
A data frame, containing the variables in |
x |
A matrix or data frame with feature values. |
y |
A factor variable with categorical values for |
laplace |
A value for Laplace smoothing to avoid zero probability problem, default |
The Naive Bayes model is based on Bayes' theorem: P(A|B) = P(B|A) * P(A) / P(B)
Adopted to a classification problem, the equation is: P(y=k|X) = P(X|y=k) * P(y=k) / P(X)
, whereby
P(y=k|X)
is the conditional probability of y=k
given a feature set X
. This probability is also called posterior probability.
P(X|y=k)
is the conditional probability of X
given a specific category k
of y
. This probability is also called the probability of likelihood of evidence.
P(y=k)
is the probability that y
takes the value k
. This probability is also called the prior probability.
P(X)
is the probability that features X
have the given values. This probability is also called the probability of evidence.
This probability is constant for every value of y
, and therefore it will not affect the posterior probabilities. For reasons of simplification, the probability of evidence will be ignored in computation.
The result without probability of evidence is no longer strictly a probability. The calculated largest value is used for class prediction.
A list from class naivebayes
with levels and prior probabilities of y
and names and likelihood distribution parameters of x
categorized by the levels of factor y
.
Other Machine Learning:
cross_validation_split()
,
decision_tree()
,
k_nearest_neighbors()
,
moving_average()
,
naive_forecast()
,
predict.decisiontree()
,
predict.kmeans()
,
predict.naivebayes()
# Continuous features
df <- data.frame(y = c(0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L),
x1 = c(3.393533211, 3.110073483, 1.343808831, 3.582294042, 2.280362439, 7.423436942, 5.745051997, 9.172168622, 7.792783481, 7.939820817),
x2 = c(2.331273381, 1.781539638, 3.368360954, 4.67917911, 2.866990263, 4.696522875, 3.533989803, 2.511101045, 3.424088941, 0.791637231))
# Categorical features
fruit_type <- c("Banana", "Orange", "Other")
# Banana
Long <- (v <- c(rep(1, 400), rep(0, 100)))[sample(length(v))]
Sweet <- (v <- c(rep(1, 350), rep(0, 150)))[sample(length(v))]
Yellow <- (v <- c(rep(1, 450), rep(0, 50)))[sample(length(v))]
fruit <- data.frame(Type = fruit_type[1L], Long, Sweet, Yellow)
# Orange
Type <- rep(fruit_type[2L], 300)
Long <- (v <- c(rep(1, 0), rep(0, 300)))[sample(length(v))]
Sweet <- (v <- c(rep(1, 150), rep(0, 150)))[sample(length(v))]
Yellow <- (v <- c(rep(1, 300), rep(0, 0)))[sample(length(v))]
fruit <- rbind.data.frame(fruit, cbind.data.frame(Type, Long, Sweet, Yellow))
# Other
Type <- rep(fruit_type[3L], 200)
Long <- (v <- c(rep(1, 100), rep(0, 100)))[sample(length(v))]
Sweet <- (v <- c(rep(1, 150), rep(0, 50)))[sample(length(v))]
Yellow <- (v <- c(rep(1, 50), rep(0, 150)))[sample(length(v))]
fruit <- rbind.data.frame(fruit, cbind.data.frame(Type, Long, Sweet, Yellow))
fruit <- fruit[sample(NROW(fruit)), ]
rownames(fruit) <- seq_len(NROW(fruit))
to_factor <- c("Type", "Long", "Sweet", "Yellow")
fruit[to_factor] <- lapply(fruit[to_factor], as.factor)
df <- fruit
rm(Long, Sweet, Yellow, Type, v, to_factor, fruit_type, fruit)
x <- df[, -1L]
y <- as.factor(df[[1L]])
nb <- naive_bayes(as.formula(y ~ .), data = df) # change y to Type for second example
yposterior <- predict(nb, x)
yhat <- levels(y)[apply(yposterior, 1L, which.max)]
deepANN::accuracy(y, yhat)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.