Home

/

GitHub

/

mattwloftis/agendacodeR

/

trainNB: Train Naive Bayes

trainNB: Train Naive Bayes
In mattwloftis/agendacodeR: Naive Bayes multiclass classification of Policy Agendas data

View source: R/trainNB.R

trainNB

R Documentation

Train Naive Bayes

Description

Trains multiclass Naive Bayes classifier

Usage

trainNB(coding, train_matrix, smoothing = c("normalized", "simple",
  "parameterized", "none"), alpha = 2, beta = 10)

Arguments

`coding`	Numeric vector of training document codings
`train_matrix`	A `quanteda` document-feature matrix with the number of rows equal to the length of `coding`
`smoothing`	Type of Laplacian smoothing for term priors. See 'Details'.
`alpha`	Smoothing hyperparameter for 'parameterized' smoothing
`beta`	Smoothing hyperparameter for 'parameterized' smoothing

Details

Smoothing method defaults to 'normalized' using the system advocated by Frank and Bouckaert (2006) for per-class word vector normalization.

Using 'simple' will employ a simple version of Laplacian smoothing described in Metsis et al. (2006). Prior probability of term appearance, given a class, is just frequency of term in class plus 1 over count of documents in class plus 2.

Using 'parameterized' will use a version of smoothing mentioned in O'Neil & Schutt (2013) for multiclass Naive bayes. Prior prob. of term appearance, given a class, is frequency of term in class plus alpha minus 1 over count of documents in class plus alpha plus beta minus 2.

Using 'none' is inadvisable. In this case, prior prob. of term appearance, given a class, is frequency of term in class over count of documents in class. This will likely generate zero priors, which is a problem.

Value

A list with the elements

`w_0c`	Constant portion of NB classification probabilities.
`w_jc`	Portion of NB classification probabilities that varies with test document word appearances.
`nc`	Frequency of each category in training documents (named numeric vector)
`theta_c`	Unsmoothed prior class probabilities (named numeric vector)

Author(s)

Matt W. Loftis

References

Frank, E. and Bouckaert, R.R. (2006) Naive Bayes for Text Classification with Unbalanced Classes. s, Knowledge Discovery in Databases: PKDD, 503-510.

Metsis, V. Androutsopoulos, I. and Paliouras, G. (2006) Spam Filtering with Naive Bayes – Which Naive Bayes? CEAS 2006 - Third Conference on Email and Anti-Spam, July 27-28, 2006, Mountain View, California USA.

O'Neil, C. and Schutt, R. (2013) Doing Data Science: Straight Talk from the Frontline. O'Reilly.

Examples

## Load data and create document-feature matrices
train_corpus <- quanteda::corpus(x = training_agendas$text)
train_matrix <- quanteda::dfm(train_corpus,
                    language = "danish",
                    stem = TRUE,
                    removeNumbers = FALSE)

test_corpus <- quanteda::corpus(x = test_agendas$text)
test_matrix <- quanteda::dfm(test_corpus,
                   language = "danish",
                   stem = TRUE,
                   removeNumbers = FALSE)

## Convert matrix of frequencies to matrix of indicators
train_matrix@x[train_matrix@x > 1] <- 1
test_matrix@x[test_matrix@x > 1] <- 1

## Dropping training features not in the test set
train_matrix <- train_matrix[, (colnames(train_matrix) %in% colnames(test_matrix))]

est <- trainNB(training_agendas$coding, train_matrix)

mattwloftis/agendacodeR documentation built on June 5, 2023, 7 p.m.

mattwloftis/agendacodeR index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

mattwloftis/agendacodeR
Naive Bayes multiclass classification of Policy Agendas data

trainNB: Train Naive Bayes
In mattwloftis/agendacodeR: Naive Bayes multiclass classification of Policy Agendas data

Train Naive Bayes

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Related to trainNB in mattwloftis/agendacodeR...

R Package Documentation

Browse R Packages

We want your feedback!

mattwloftis/agendacodeR Naive Bayes multiclass classification of Policy Agendas data

trainNB: Train Naive Bayes In mattwloftis/agendacodeR: Naive Bayes multiclass classification of Policy Agendas data

Train Naive Bayes

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Related to trainNB in mattwloftis/agendacodeR...

R Package Documentation

Browse R Packages

We want your feedback!

mattwloftis/agendacodeR
Naive Bayes multiclass classification of Policy Agendas data

trainNB: Train Naive Bayes
In mattwloftis/agendacodeR: Naive Bayes multiclass classification of Policy Agendas data