reduce_dtm_lognet_cv: Internal Supreme function
In paolofantini/Supreme: Make it easier applying LDA topic models to a corpus of Italian Supreme Court decisions

Description Usage Arguments Details Value Note Examples

reduce_dtm_lognet_cv reduces the number of terms (columns) of a labeled document-term matrix. reduce_dtm_lognet_cv is called by the reduce_dtm function.

1 2	reduce_dtm_lognet_cv(dtm, classes, lambda = c("lambda.min", "lambda.1se"), SEED, c_normalize = TRUE, parallel = TRUE, export = FALSE)

`dtm`	a document-term matrix in term frequency format.
`classes`	factor, the labeling variable.
`lambda`	a string with the selection rule of the optimal fit.
`SEED`	integer, the random seed for selecting train and test sets.
`c_normalize`	logical. If `TRUE` `dtm` entries are (cosine) normalized. Default is `TRUE`.
`parallel`	logical. If `TRUE` parallel cross-validation is performed. Default is `TRUE`.
`export`	logical. If `TRUE` export the discarded terms, the vocabulary and the returned object to the built-in directory `data/dtm`. Default is `FALSE`.

This function fits a logistic classification model via penalized maximum likelihood by calling the lognet function from package glmnet. The regularization path is only computed for the lasso penalty at a grid of values for the regularization parameter lambda. If c_normalize = TRUE (default) the dtm is passed for cosine normalization to the wTfIdf function. Reduction of number of terms is performed by selecting only columns corresponding to the non zero beta coefficients in the optimal fit.

a list with the reduced dtm (in term frequency format), the IDs of documents belonging to the training set, the glmnet fit object, the position of the best lambda, the selected terms by class, and the train and test misclassification errors err1.train and err1.test. Confusion matrix is also returned.

Tuning parameters alpha and lambda are respectively set in the optimal fit to 1 (default) and one out of lambda.min or lambda.1se. The latter follows from the "minimum training error rule" and the former from the more conservative approach of the "one standard error rule". Full details are given in "The Elements of Statistical Learnings" (T. Hastie, R. Tibshirani, J. Friedman) 2nd edition p. 61.

## Not run: 
library(Supreme)
data("dtm")
data("classes")
dtm.lognet.cv <- reduce_dtm_lognet_cv(dtm, classes, lambda = "lambda.1se", SEED = 123)

## End(Not run)

paolofantini/Supreme documentation built on May 24, 2019, 6:14 p.m.

paolofantini/Supreme index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

paolofantini/Supreme
Make it easier applying LDA topic models to a corpus of Italian Supreme Court decisions

reduce_dtm_lognet_cv: Internal Supreme function
In paolofantini/Supreme: Make it easier applying LDA topic models to a corpus of Italian Supreme Court decisions

Description

Usage

Arguments

Details

Value

Note

Examples

Related to reduce_dtm_lognet_cv in paolofantini/Supreme...

R Package Documentation

Browse R Packages

We want your feedback!

paolofantini/Supreme Make it easier applying LDA topic models to a corpus of Italian Supreme Court decisions

reduce_dtm_lognet_cv: Internal Supreme function In paolofantini/Supreme: Make it easier applying LDA topic models to a corpus of Italian Supreme Court decisions

Description

Usage

Arguments

Details

Value

Note

Examples

Related to reduce_dtm_lognet_cv in paolofantini/Supreme...

R Package Documentation

Browse R Packages

We want your feedback!

paolofantini/Supreme
Make it easier applying LDA topic models to a corpus of Italian Supreme Court decisions

reduce_dtm_lognet_cv: Internal Supreme function
In paolofantini/Supreme: Make it easier applying LDA topic models to a corpus of Italian Supreme Court decisions