textmodel_svmlin: [experimental] Linear SVM classifier for texts
In quanteda.textmodels: Scaling Models and Classifiers for Textual Data

textmodel_svmlin

R Documentation

[experimental] Linear SVM classifier for texts

Description

Fit a fast linear SVM classifier for sparse text matrices, using svmlin C++ code written by Vikas Sindhwani and S. Sathiya Keerthi. This method implements the modified finite Newton L2-SVM method (L2-SVM-MFN) method described in Sindhwani and Keerthi (2006). Currently, textmodel_svmlin() only works for two-class problems.

Usage

textmodel_svmlin(
  x,
  y,
  intercept = TRUE,
  lambda = 1,
  cp = 1,
  cn = 1,
  scale = FALSE,
  center = FALSE
)

Arguments

`x`	the dfm on which the model will be fit. Does not need to contain only the training documents.
`y`	vector of training labels associated with each document identified in `train`. (These will be converted to factors if not already factors.)
`intercept`	logical; if `TRUE`, add an intercept to the data
`lambda`	numeric; regularization parameter lambda (default 1)
`cp`	numeric; Relative cost for "positive" examples (the second factor level)
`cn`	numeric; Relative cost for "negative" examples (the first factor level)
`scale`	logical; if `TRUE`, normalize the feature counts
`center`	logical; if `TRUE`, centre the feature counts

Value

a fitted model object of class textmodel_svmlin

Warning

This function is marked experimental since it's not fully working yet in a way that translates into more standard SVM parameters that we understand. Use with caution after reading the Sindhwani and Keerthi (2006) paper.

References

Vikas Sindhwani and S. Sathiya Keerthi (2006). Large Scale Semi-supervised Linear SVMs. Proceedings of ACM SIGIR. August 6–11, 2006, Seattle.

V. Sindhwani and S. Sathiya Keerthi (2006). Newton Methods for Fast Solution of Semi-supervised Linear SVMs. Book Chapter in Large Scale Kernel Machines, MIT Press, 2006.

Examples

# use Lenihan for govt class and Bruton for opposition
library("quanteda")
docvars(data_corpus_irishbudget2010, "govtopp") <- c("Govt", "Opp", rep(NA, 12))
dfmat <- dfm(tokens(data_corpus_irishbudget2010))

tmod <- textmodel_svmlin(dfmat, y = dfmat$govtopp)
predict(tmod)

quanteda.textmodels documentation built on April 12, 2025, 1:43 a.m.