NaiveBayes: NaiveBayes

Description Usage Arguments Details Value Note Examples

View source: R/NaiveBayes.R

Description

This NaiveBayes package provides an efficient implementation of the very popular Naive Bayes classifier, which assumes independence between the feature variables. The core classification function is written in Rcpp. Gaussian distribution is used with numerical variables. Please use 'NaiveBayes (...)' for model fitting, and use 'predict (...)' to obtain its corresponding predictions.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
NaiveBayes(x, ...)

## Default S3 method:
NaiveBayes(x, y, laplace = 0, ...)

## S3 method for class 'formula'
NaiveBayes(formula, data, laplace = 0, ...)

## S3 method for class 'NaiveBayes'
print(x, ...)

## S3 method for class 'NaiveBayes'
predict(
  object,
  newdata,
  type = c("class", "raw"),
  threshold = 0.001,
  eps = 0,
  ...
)

Arguments

x

matrix or dataframe with categorical ( character / factor / logical ) or metric ( numeric ) predictors. Please correctly specified data types in each column. No NA is allowed.

...

not used

y

class vector ( character / factor / logical )

laplace

value used for Laplace smoothing ( additive smoothing ). Defaults to 0 ( no Laplace smoothing )

formula

users can also input their data via NaiveBayes ( formula, data = ... ) format. A formula of the form "class ~ x1 + x2 + x3 ..." Interactions are not allowed

data

Either a datafrom of predictors ( categorical and/or numeric) or a contigency table.

object

a fitted object of class "NaiveBayes"

newdata

matrix or dataframe with categorical ( character / factor / logical ) or metric ( numeric ) predictors. Note: if NaiveBayes was used to create the model, then if newdata contains features that were not encountered in the training data, these are omitted from the prediction.

type

if "class", new data points are classified according to the highest posterior probabilities. If "raw", the posterior probabilities for each class are returned.

threshold

value by which zero probabilities or probabilities within the epsilon-range corresponding to metric variables are replaced ( zero probabilities corresponding to categorical variables can be handled with Laplace ( additive ) smoothing ).

eps

value that specifies an epsilon-range to replace zero or close to zero probabilities by threshold. It applies to metric variables.

Details

The general function NaiveBayes() detects the class of each feature in the dataset and assumes possibly different distribution for each feature. Predict function uses a NaiveBayes model and a new data set to create the classifications. This can either be the raw probabilities generated by the NaiveBayes model or the classes themselves.

1. Numeric ( metric ) predictors are handled by assuming that they follow Gaussian distribution, given the class label; Missing values are not included into constructing tables. Logical variables are treated as categorical ( binary ) variables. 2. Prediction function computes conditional posterior probabilities for each lass label using the Bayes' rule under the assumption of independence of predictors. Logical variables are treated as categorical ( binary ) variables.

Value

An object of class "NaiveBayes", which has five components:

Note

The class "numeric" contains "double" ( double precision floating point numbers ) and "integer". Prior the model fittng the classes of columns in the data.frame "data" can be easily checked via:

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
x = as.matrix(iris[, 1:4])
mymodel1 = NaiveBayes(iris[, 1:4], iris[, 5])
## or
mymodel1_f = NaiveBayes(Species ~. , data = iris)
predict(mymodel1, iris[, 1:4])

x1 <- matrix( rpois(100 * 4, 5), ncol = 4)
x2 <- matrix( rpois(50 * 4, 10), ncol = 4)
x <- rbind(x1, x2)
ina <- c( rep(1, 100), rep(2, 50) )
mymodel2 = NaiveBayes(x, ina)
predict(mymodel2, x)

sidiwang/NaiveBayes documentation built on Nov. 26, 2019, 9 a.m.