textmodel_svm: Linear SVM classifier for texts

View source: R/textmodel_svm.R

textmodel_svmR Documentation

Linear SVM classifier for texts

Description

Fit a fast linear SVM classifier for texts, using the LiblineaR package.

Usage

textmodel_svm(
  x,
  y,
  weight = c("uniform", "docfreq", "termfreq"),
  type = 1,
  ...
)

Arguments

x

the dfm on which the model will be fit. Does not need to contain only the training documents.

y

vector of training labels associated with each document identified in train. (These will be converted to factors if not already factors.)

weight

weights for different classes for imbalanced training sets, passed to wi in LiblineaR::LiblineaR(). "uniform" uses default; "docfreq" weights by the number of training examples, and "termfreq" by the relative sizes of the training classes in terms of their total lengths in tokens.

type

argument passed to the type argument in LiblineaR::LiblineaR(); default is 1 for L2-regularized L2-loss support vector classification (dual)

...

additional arguments passed to LiblineaR::LiblineaR()

References

R. E. Fan, K. W. Chang, C. J. Hsieh, X. R. Wang, and C. J. Lin. (2008) LIBLINEAR: A Library for Large Linear Classification. Journal of Machine Learning Research 9: 1871-1874. https://www.csie.ntu.edu.tw/~cjlin/liblinear/.

See Also

LiblineaR::LiblineaR() predict.textmodel_svm()

Examples

# use party leaders for govt and opposition classes
library("quanteda")
docvars(data_corpus_irishbudget2010, "govtopp") <-
    c(rep(NA, 4), "Gov", "Opp", NA, "Opp", NA, NA, NA, NA, NA, NA)
dfmat <- dfm(tokens(data_corpus_irishbudget2010))
tmod <- textmodel_svm(dfmat, y = dfmat$govtopp)
predict(tmod)

# multiclass problem - all party leaders
tmod2 <- textmodel_svm(dfmat,
    y = c(rep(NA, 3), "SF", "FF", "FG", NA, "LAB", NA, NA, "Green", rep(NA, 3)))
predict(tmod2)

quanteda.textmodels documentation built on Oct. 5, 2022, 1:06 a.m.