Description Usage Arguments Details Value Note Examples
reduce_dtm_lognet_cv
reduces the number of terms (columns) of a labeled document-term matrix.
reduce_dtm_lognet_cv
is called by the reduce_dtm
function.
1 2 |
dtm |
a document-term matrix in term frequency format. |
classes |
factor, the labeling variable. |
lambda |
a string with the selection rule of the optimal fit. |
SEED |
integer, the random seed for selecting train and test sets. |
c_normalize |
logical. If |
parallel |
logical. If |
export |
logical. If |
This function fits a logistic classification model via penalized maximum likelihood
by calling the lognet
function from package glmnet.
The regularization path is only computed for the lasso penalty at a grid of values
for the regularization parameter lambda
.
If c_normalize = TRUE
(default) the dtm
is passed
for cosine normalization to the wTfIdf
function.
Reduction of number of terms is performed by selecting only columns corresponding
to the non zero beta coefficients in the optimal fit.
a list with the reduced dtm
(in term frequency format),
the IDs of documents belonging to the training set, the glmnet
fit object,
the position of the best lambda, the selected terms by class, and the train and test
misclassification errors err1.train
and err1.test
.
Confusion matrix is also returned.
Tuning parameters alpha
and lambda
are respectively set in the optimal fit
to 1
(default) and one out of lambda.min
or lambda.1se
.
The latter follows from the "minimum training error rule" and the former
from the more conservative approach of the "one standard error rule".
Full details are given in "The Elements of Statistical Learnings"
(T. Hastie, R. Tibshirani, J. Friedman) 2nd edition p. 61.
1 2 3 4 5 6 7 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.