textmodel_svmlin: [experimental] Linear SVM classifier for texts

View source: R/textmodel_svmlin.R

textmodel_svmlinR Documentation

[experimental] Linear SVM classifier for texts

Description

Fit a fast linear SVM classifier for sparse text matrices, using svmlin C++ code written by Vikas Sindhwani and S. Sathiya Keerthi. This method implements the modified finite Newton L2-SVM method (L2-SVM-MFN) method described in Sindhwani and Keerthi (2006). Currently, textmodel_svmlin() only works for two-class problems.

Usage

textmodel_svmlin(
  x,
  y,
  intercept = TRUE,
  lambda = 1,
  cp = 1,
  cn = 1,
  scale = FALSE,
  center = FALSE
)

Arguments

x

the dfm on which the model will be fit. Does not need to contain only the training documents.

y

vector of training labels associated with each document identified in train. (These will be converted to factors if not already factors.)

intercept

logical; if TRUE, add an intercept to the data

lambda

numeric; regularization parameter lambda (default 1)

cp

numeric; Relative cost for "positive" examples (the second factor level)

cn

numeric; Relative cost for "negative" examples (the first factor level)

scale

logical; if TRUE, normalize the feature counts

center

logical; if TRUE, centre the feature counts

Value

a fitted model object of class textmodel_svmlin

Warning

This function is marked experimental since it's not fully working yet in a way that translates into more standard SVM parameters that we understand. Use with caution after reading the Sindhwani and Keerthi (2006) paper.

References

Vikas Sindhwani and S. Sathiya Keerthi (2006). Large Scale Semi-supervised Linear SVMs. Proceedings of ACM SIGIR. August 6–11, 2006, Seattle.

V. Sindhwani and S. Sathiya Keerthi (2006). Newton Methods for Fast Solution of Semi-supervised Linear SVMs. Book Chapter in Large Scale Kernel Machines, MIT Press, 2006.

See Also

predict.textmodel_svmlin()

Examples

# use Lenihan for govt class and Bruton for opposition
library("quanteda")
docvars(data_corpus_irishbudget2010, "govtopp") <- c("Govt", "Opp", rep(NA, 12))
dfmat <- dfm(tokens(data_corpus_irishbudget2010))

tmod <- textmodel_svmlin(dfmat, y = dfmat$govtopp)
predict(tmod)

quanteda.textmodels documentation built on Sept. 11, 2024, 8:19 p.m.