naiveBayes: Naive Bayes Classifier

Description Usage Arguments Details Value Author(s) Examples

View source: R/naiveBayes.R

Description

Computes the conditional a-posterior probabilities of a categorical class variable given independent predictor variables using the Bayes rule.

Usage

1
2
3
4
5
6
7
8
9
## S3 method for class 'formula'
naiveBayes(formula, data, laplace = 0, ..., subset, na.action = na.pass)
## Default S3 method:
naiveBayes(x, y, laplace = 0, ...)


## S3 method for class 'naiveBayes'
predict(object, newdata,
  type = c("class", "raw"), threshold = 0.001, eps = 0, ...)

Arguments

x

A numeric matrix, or a data frame of categorical and/or numeric variables.

y

Class vector.

formula

A formula of the form class ~ x1 + x2 + .... Interactions are not allowed.

data

Either a data frame of predictors (categorical and/or numeric) or a contingency table.

laplace

positive double controlling Laplace smoothing. The default (0) disables Laplace smoothing.

...

Currently not used.

subset

For data given in a data frame, an index vector specifying the cases to be used in the training sample. (NOTE: If given, this argument must be named.)

na.action

A function to specify the action to be taken if NAs are found. The default action is not to count them for the computation of the probability factors. An alternative is na.omit, which leads to rejection of cases with missing values on any required variable. (NOTE: If given, this argument must be named.)

object

An object of class "naiveBayes".

newdata

A dataframe with new predictors (with possibly fewer columns than the training data). Note that the column names of newdata are matched against the training data ones.

type

If "raw", the conditional a-posterior probabilities for each class are returned, and the class with maximal probability else.

threshold

Value replacing cells with probabilities within eps range.

eps

double for specifying an epsilon-range to apply laplace smoothing (to replace zero or close-zero probabilities by theshold.)

Details

The standard naive Bayes classifier (at least this implementation) assumes independence of the predictor variables, and Gaussian distribution (given the target class) of metric predictors. For attributes with missing values, the corresponding table entries are omitted for prediction.

Value

An object of class "naiveBayes" including components:

apriori

Class distribution for the dependent variable.

tables

A list of tables, one for each predictor variable. For each categorical variable a table giving, for each attribute level, the conditional probabilities given the target class. For each numeric variable, a table giving, for each target class, mean and standard deviation of the (sub-)variable.

Author(s)

David Meyer David.Meyer@R-project.org. Laplace smoothing enhancement by Jinghao Xue.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
## Categorical data only:
data(HouseVotes84, package = "mlbench")
model <- naiveBayes(Class ~ ., data = HouseVotes84)
predict(model, HouseVotes84[1:10,])
predict(model, HouseVotes84[1:10,], type = "raw")

pred <- predict(model, HouseVotes84)
table(pred, HouseVotes84$Class)

## using laplace smoothing:
model <- naiveBayes(Class ~ ., data = HouseVotes84, laplace = 3)
pred <- predict(model, HouseVotes84[,-1])
table(pred, HouseVotes84$Class)


## Example of using a contingency table:
data(Titanic)
m <- naiveBayes(Survived ~ ., data = Titanic)
m
predict(m, as.data.frame(Titanic))

## Example with metric predictors:
data(iris)
m <- naiveBayes(Species ~ ., data = iris)
## alternatively:
m <- naiveBayes(iris[,-5], iris[,5])
m
table(predict(m, iris), iris[,5])

Example output

 [1] republican republican republican democrat   democrat   democrat  
 [7] republican republican republican democrat  
Levels: democrat republican
          democrat   republican
 [1,] 1.029209e-07 9.999999e-01
 [2,] 5.820415e-08 9.999999e-01
 [3,] 5.684937e-03 9.943151e-01
 [4,] 9.985798e-01 1.420152e-03
 [5,] 9.666720e-01 3.332802e-02
 [6,] 8.121430e-01 1.878570e-01
 [7,] 1.751512e-04 9.998248e-01
 [8,] 8.300100e-06 9.999917e-01
 [9,] 8.277705e-08 9.999999e-01
[10,] 1.000000e+00 5.029425e-11
            
pred         democrat republican
  democrat        238         13
  republican       29        155
            
pred         democrat republican
  democrat        237         12
  republican       30        156

Naive Bayes Classifier for Discrete Predictors

Call:
naiveBayes.formula(formula = Survived ~ ., data = Titanic)

A-priori probabilities:
Survived
      No      Yes 
0.676965 0.323035 

Conditional probabilities:
        Class
Survived        1st        2nd        3rd       Crew
     No  0.08187919 0.11208054 0.35436242 0.45167785
     Yes 0.28551336 0.16596343 0.25035162 0.29817159

        Sex
Survived       Male     Female
     No  0.91543624 0.08456376
     Yes 0.51617440 0.48382560

        Age
Survived      Child      Adult
     No  0.03489933 0.96510067
     Yes 0.08016878 0.91983122

 [1] Yes No  No  No  Yes Yes Yes Yes No  No  No  No  Yes Yes Yes Yes Yes No  No 
[20] No  Yes Yes Yes Yes No  No  No  No  Yes Yes Yes Yes
Levels: No Yes

Naive Bayes Classifier for Discrete Predictors

Call:
naiveBayes.default(x = iris[, -5], y = iris[, 5])

A-priori probabilities:
iris[, 5]
    setosa versicolor  virginica 
 0.3333333  0.3333333  0.3333333 

Conditional probabilities:
            Sepal.Length
iris[, 5]     [,1]      [,2]
  setosa     5.006 0.3524897
  versicolor 5.936 0.5161711
  virginica  6.588 0.6358796

            Sepal.Width
iris[, 5]     [,1]      [,2]
  setosa     3.428 0.3790644
  versicolor 2.770 0.3137983
  virginica  2.974 0.3224966

            Petal.Length
iris[, 5]     [,1]      [,2]
  setosa     1.462 0.1736640
  versicolor 4.260 0.4699110
  virginica  5.552 0.5518947

            Petal.Width
iris[, 5]     [,1]      [,2]
  setosa     0.246 0.1053856
  versicolor 1.326 0.1977527
  virginica  2.026 0.2746501

            
             setosa versicolor virginica
  setosa         50          0         0
  versicolor      0         47         3
  virginica       0          3        47

e1071 documentation built on May 31, 2017, 4:17 a.m.

Search within the e1071 package
Search all R packages, documentation and source code