NaiveBayes: NaiveBayes
In sidiwang/NaiveBayes: Naive Bayes Classification and Prediction

Description Usage Arguments Details Value Note Examples

View source: R/NaiveBayes.R

This NaiveBayes package provides an efficient implementation of the very popular Naive Bayes classifier, which assumes independence between the feature variables. The core classification function is written in Rcpp. Gaussian distribution is used with numerical variables. Please use 'NaiveBayes (...)' for model fitting, and use 'predict (...)' to obtain its corresponding predictions.

NaiveBayes(x, ...)

## Default S3 method:
NaiveBayes(x, y, laplace = 0, ...)

## S3 method for class 'formula'
NaiveBayes(formula, data, laplace = 0, ...)

## S3 method for class 'NaiveBayes'
print(x, ...)

## S3 method for class 'NaiveBayes'
predict(
  object,
  newdata,
  type = c("class", "raw"),
  threshold = 0.001,
  eps = 0,
  ...
)

`x`	matrix or dataframe with categorical ( character / factor / logical ) or metric ( numeric ) predictors. Please correctly specified data types in each column. No NA is allowed.
`...`	not used
`y`	class vector ( character / factor / logical )
`laplace`	value used for Laplace smoothing ( additive smoothing ). Defaults to 0 ( no Laplace smoothing )
`formula`	users can also input their data via NaiveBayes ( formula, data = ... ) format. A formula of the form "class ~ x1 + x2 + x3 ..." Interactions are not allowed
`data`	Either a datafrom of predictors ( categorical and/or numeric) or a contigency table.
`object`	a fitted object of class "NaiveBayes"
`newdata`	matrix or dataframe with categorical ( character / factor / logical ) or metric ( numeric ) predictors. Note: if NaiveBayes was used to create the model, then if newdata contains features that were not encountered in the training data, these are omitted from the prediction.
`type`	if "class", new data points are classified according to the highest posterior probabilities. If "raw", the posterior probabilities for each class are returned.
`threshold`	value by which zero probabilities or probabilities within the epsilon-range corresponding to metric variables are replaced ( zero probabilities corresponding to categorical variables can be handled with Laplace ( additive ) smoothing ).
`eps`	value that specifies an epsilon-range to replace zero or close to zero probabilities by `threshold`. It applies to metric variables.

The general function NaiveBayes() detects the class of each feature in the dataset and assumes possibly different distribution for each feature. Predict function uses a NaiveBayes model and a new data set to create the classifications. This can either be the raw probabilities generated by the NaiveBayes model or the classes themselves.

1. Numeric ( metric ) predictors are handled by assuming that they follow Gaussian distribution, given the class label; Missing values are not included into constructing tables. Logical variables are treated as categorical ( binary ) variables. 2. Prediction function computes conditional posterior probabilities for each lass label using the Bayes' rule under the assumption of independence of predictors. Logical variables are treated as categorical ( binary ) variables.

An object of class "NaiveBayes", which has five components:

apriori Class probabilities for the dependent variable
results A list of tables, one for each predictor variable. For each categorical variable a table giving, for each attribute level, the conditional probabilities given the target class. For each numeric variable, a table giving, for each target class, mean and standard deviation of the variable.
predictors The list of independent variables
call The call that produced this object.
level Levels of the dependent variable

The class "numeric" contains "double" ( double precision floating point numbers ) and "integer". Prior the model fittng the classes of columns in the data.frame "data" can be easily checked via:

sapply(data, class)
sapply(data, is.numeric)
sapply(data, is.double)
sapply(data, is.integer)

x = as.matrix(iris[, 1:4])
mymodel1 = NaiveBayes(iris[, 1:4], iris[, 5])
## or
mymodel1_f = NaiveBayes(Species ~. , data = iris)
predict(mymodel1, iris[, 1:4])

x1 <- matrix( rpois(100 * 4, 5), ncol = 4)
x2 <- matrix( rpois(50 * 4, 10), ncol = 4)
x <- rbind(x1, x2)
ina <- c( rep(1, 100), rep(2, 50) )
mymodel2 = NaiveBayes(x, ina)
predict(mymodel2, x)