nbc: Naive Bayes Classifier

Description Usage Arguments Details Value AUTO Author(s)

Description

A Naive Bayes Classifier is a model of a categorical response variable with the primary assumption that the predictors are independent of eachother.

Usage

1
2
3
NBC(data, ..., outputs = result)
NBCMake(predictors, response, histogram.width.factor = 500,
        number.bins = 10)

Arguments

data

an object of class "data".

...

arguments passed onto KMeansMake

outputs

the name of the result. If not length 1, an error is thrown.

predictors

expressions to use as predictors in the model.

response

a categorical attribute of data to be used as the response in the model.

range the number of standard deviations of the predictors to model for. See ‘details’ for more information.

bins the number of bins in each histogram. See ‘details’ for more information.

Details

In the tradtional Naive Bayes Classifier, posterior probabilities for numerical predictor variables are calculated by using the training data to a model for each predictor per classification of the response and then applying the appropriate PDF. However, this introduces additional parameters, such as the specification of the underlying distributions, and also increases the running time. Instead, the underlying model and associated PDF are approximated by constructing a histogram during the training, which is then used to estimate the posterior probability by examining how often the training data fell within the same bin as the predictor. The trade-off for the above benefits is a slight loss of accuracy as well as increased memory usage.

The exact shape of the histograms is specified by range and bins. Due to the presence of outliers, it may be unrealistic to construct a histogram for the whole of the data; instead, the interval of data that the histogram is constructed on is [μ - n σ, μ + n σ], where range is denoted by n. The number of sub-intervals to use in the histogram is given by bins. In the event that this interval exceeds the actual range of the data in either direction, the appropriate endpoint is restricted to that of the data.

Value

An object of class "data" with a single attribute.

AUTO

AUTO is not allowed for any argument.

Author(s)

Jon Claus, <jonterainsights@gmail.com>, Tera Insights LLC


tera-insights/gtStats documentation built on May 31, 2019, 8:36 a.m.