Tools for Statistical Content Analysis created at TU Dortmund University.
tosca
is a framework for statistical methods in content analysis. We offer a pipeline for preprocessing, model text corpora using a link to the implemantation of Latent Dirichlet Allocation from the lda
package. Useful plot routines for both - pre- and post-modeled corpora - are given for the descriptive analysis of text corpora and topic models. Moreover, an implementation of Chang's intruder words and intruder topics is provided; as well as reasoned sampling of text ids to get effective sets of texts for human labeling/coding regarding accuracy of estimating Precision and Recall.
See examples how to use tosca
at the Vignette.
For a BibTeX entry please use citation(package = "tosca")
.
This R package is licensed under the GPLv3. For wishes, issues, and bugs please use the issue tracker.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.