README.md
In Docma-TU/tosca: Tools for Statistical Content Analysis

tosca

Tools for Statistical Content Analysis created at TU Dortmund University.

tosca is a framework for statistical methods in content analysis. We offer a pipeline for preprocessing, model text corpora using a link to the implemantation of Latent Dirichlet Allocation from the lda package. Useful plot routines for both - pre- and post-modeled corpora - are given for the descriptive analysis of text corpora and topic models. Moreover, an implementation of Chang's intruder words and intruder topics is provided; as well as reasoned sampling of text ids to get effective sets of texts for human labeling/coding regarding accuracy of estimating Precision and Recall.

See examples how to use tosca at the Vignette.

For a BibTeX entry please use citation(package = "tosca").

This R package is licensed under the GPLv3. For wishes, issues, and bugs please use the issue tracker.