textmodel_lr | R Documentation |
Fits a fast penalized maximum likelihood estimator to predict discrete categories from sparse dfm objects. Using the glmnet package, the function computes the regularization path for the lasso or elasticnet penalty at a grid of values for the regularization parameter lambda. This is done automatically by testing on several folds of the data at estimation time.
textmodel_lr(x, y, ...)
x |
the dfm on which the model will be fit. Does not need to contain only the training documents. |
y |
vector of training labels associated with each document identified
in |
... |
additional arguments passed to |
Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software 33(1), 1-22. doi: 10.18637/jss.v033.i01
cv.glmnet()
, predict.textmodel_lr()
,
coef.textmodel_lr()
## Example from 13.1 of _An Introduction to Information Retrieval_ library("quanteda") corp <- corpus(c(d1 = "Chinese Beijing Chinese", d2 = "Chinese Chinese Shanghai", d3 = "Chinese Macao", d4 = "Tokyo Japan Chinese", d5 = "London England Chinese", d6 = "Chinese Chinese Chinese Tokyo Japan"), docvars = data.frame(train = factor(c("Y", "Y", "Y", "N", "N", NA)))) dfmat <- dfm(tokens(corp), tolower = FALSE) ## simulate bigger sample as classification on small samples is problematic set.seed(1) dfmat <- dfm_sample(dfmat, 50, replace = TRUE) ## train model (tmod1 <- textmodel_lr(dfmat, docvars(dfmat, "train"))) summary(tmod1) coef(tmod1) ## predict probability and classes predict(tmod1, type = "prob") predict(tmod1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.