compClass: Internal Supreme function
In paolofantini/Supreme: Make it easier applying LDA topic models to a corpus of Italian Supreme Court decisions

Description Usage Arguments Details Value Note Examples

compClass fits a logistic classification model (from packages glmnet and caret) to the posterior topic compositions of each document as trained by Latent Dirichelet Allocation. A classes variable is used as a classification variable. compClass is called by the mcLDA function.

1 2	compClass(predictors, classes, inTraining, train.glmnet = FALSE, cv.parallel = FALSE, train.parallel = FALSE)

`predictors`	the matrix of predictors, i. e. the posterior topic compositions of each document.
`classes`	factor, the classification variable.
`inTraining`	the numeric ids of documents belonging to the training set.
`train.glmnet`	logical. If `TRUE` run `train.glmnet` function from package caret. Default is `FALSE`.
`cv.parallel`	logical. If `TRUE` parallel computation is used in Method1 with the maximum number of available cores. Default is `FALSE`.
`train.parallel`	logical. If `TRUE` parallel computation is used in Method2 with the maximum number of available cores. Default is `FALSE`.

This function recognizes the compositional nature of the predictors and applies the principle of working on coordinates when facing with compositional data. Isometric log-ratio transformed versions of the predictors (by the ilr function from package compositions) are provided as input to the classification model.

We considered three different methods.

Method0 and Method1 are respectively built on the functions glmnet and cv.glmnet from package glmnet. Method2 refers to the function train.glmnet from package caret. Method0 tends to overfit the training set. Method1 and Method2 try to avoid overfitting problems using cross-validation. Method2 uses repeated cross-validation and is more stable than Method1 but much more time-consuming (parallel computation is allowed).

err list of misclassification errors (error = 1 - Accuracy) and confusion matrices (from package caret):

`e0.train`	train error from method "glmnet"
`e1.train`	train error from method "cv.glmnet"
`e2.train`	train error from method "train.glmnet"
`e0.test`	test error from method "predict.glmnet"
`e1.test`	test error from method "predict.cv.glmnet"
`e2.test`	test error from method "predict.train.glmnet"
`cm0`	confusion matrix for method "glmnet"
`cm1`	confusion matrix for method "cv.glmnet"
`cm1`	confusion matrix for method "train.glmnet"

Tuning parameters are alpha and lambda. Method0 and Method1 pick no value for alpha and it remains at default value alpha = 1. Method2 selects values for alpha and lambda using the tuning parameter grid defined by expand.grid(alpha = seq(0.1, 1, 0.1), lambda = glmnetFit0$lambda). More details can be found here. In Method1 the best model is selected using the "one standard error rule": default best value of the penalty parameter lambda is s = "lambda.1se", stored on the cv.glmnet object. Such a rule takes a conservative approach. Alternatively s = "lambda.min" can be used. Full details are given in "The Elements of Statistical Learnings" (T. Hastie, R. Tibshirani, J. Friedman) 2nd edition p. 61. Insights on compositions and their use in R can be found in "Analyzing compositional data with R" (K. Gerald van den Boogaart, Raimon Tolosana-Delgado) Springer-Verlag 2013.

## Not run: 
library(Supreme)
library(topicmodels)

# Input data.
data("dtm")
data("classes")

# Reduced dtm.lognet
dtm.lognet <- reduce_dtm(dtm, method = "lognet", classes = classes, export = TRUE)

# Run a 35-topic model over the reduced dtm.lognet and compute the topic posteriors.
ldaVEM.mod <- LDA(dtm.lognet$reduced, k = 35, method = "VEM", control = list(seed = 2014))
topic.posteriors <- posterior(ldaVEM.mod)$topics

# Misclassification errors.
set.seed(2010)  # for inTraining reproducibility
inTraining <- caret::createDataPartition(as.factor(classes), p = 0.75, list = FALSE)  # for balancing the size of target classes in training set
mis.error <- compClass(topic.posteriors, classes, inTraining)

## End(Not run)

paolofantini/Supreme documentation built on May 24, 2019, 6:14 p.m.

paolofantini/Supreme index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

paolofantini/Supreme
Make it easier applying LDA topic models to a corpus of Italian Supreme Court decisions

compClass: Internal Supreme function
In paolofantini/Supreme: Make it easier applying LDA topic models to a corpus of Italian Supreme Court decisions

Description

Usage

Arguments

Details

Value

Note

Examples

Related to compClass in paolofantini/Supreme...

R Package Documentation

Browse R Packages

We want your feedback!

paolofantini/Supreme Make it easier applying LDA topic models to a corpus of Italian Supreme Court decisions

compClass: Internal Supreme function In paolofantini/Supreme: Make it easier applying LDA topic models to a corpus of Italian Supreme Court decisions

Description

Usage

Arguments

Details

Value

Note

Examples

Related to compClass in paolofantini/Supreme...

R Package Documentation

Browse R Packages

We want your feedback!

paolofantini/Supreme
Make it easier applying LDA topic models to a corpus of Italian Supreme Court decisions

compClass: Internal Supreme function
In paolofantini/Supreme: Make it easier applying LDA topic models to a corpus of Italian Supreme Court decisions