reduce_dtm_lognet: Internal Supreme function

Description Usage Arguments Details Value Note Examples

Description

reduce_dtm_lognet reduces the number of terms (columns) of a labeled document-term matrix. reduce_dtm_lognet is called by the reduce_dtm function.

Usage

1
reduce_dtm_lognet(dtm, classes, SEED, c_normalize = TRUE, export = FALSE)

Arguments

dtm

a document-term matrix in term frequency format.

classes

factor, the labeling variable.

SEED

integer, the random seed for selecting train and test set.

c_normalize

a Boolean value indicating whether the dtm entries should be (cosine) normalized. Default is TRUE.

export

logical. If TRUE export the discarded terms, the vocabulary and the returned object to the built-in directory data/dtm. Default is FALSE.

Details

This function applies lognet method, a logistic classification method from package glmnet, to a labeled document-term matrix. If c_normalize = TRUE (default) the input dtm is passed for cosine normalization to the wTfIdf function. Reduction of number of terms is performed by selecting only columns corresponding to the non zero beta coefficients in the optimal fit.

Value

a list with the reduced dtm (in term frequency format) and train and test misclassification errors err0.train and err0.test. Confusion matrix is also returned.

Note

alpha and lambda are tuning parameters of the lognet method: alpha = 1 (default) and the best lambda value, corresponding to the optimal fit, is associated with the minimum training error.

Examples

1
2
3
4
5
6
7
## Not run: 
library(Supreme)
data("dtm")
data("classes")
dtm.lognet <- reduce_dtm_lognet(dtm, classes, SEED = 123)

## End(Not run)

paolofantini/Supreme documentation built on May 24, 2019, 6:14 p.m.