ldaPrototype-package: ldaPrototype: Prototype of Multiple Latent Dirichlet...

ldaPrototype-packageR Documentation

ldaPrototype: Prototype of Multiple Latent Dirichlet Allocation Runs

Description

Determine a Prototype from a number of runs of Latent Dirichlet Allocation (LDA) measuring its similarities with S-CLOP: A procedure to select the LDA run with highest mean pairwise similarity, which is measured by S-CLOP (Similarity of multiple sets by Clustering with Local Pruning), to all other runs. LDA runs are specified by its assignments leading to estimators for distribution parameters. Repeated runs lead to different results, which we encounter by choosing the most representative LDA run as prototype.
For bug reports and feature requests please use the issue tracker: https://github.com/JonasRieger/ldaPrototype/issues. Also have a look at the (detailed) example at https://github.com/JonasRieger/ldaPrototype.

Data

reuters Example Dataset (91 articles from Reuters) for testing.

Constructor

LDA LDA objects used in this package.
as.LDARep LDARep objects.
as.LDABatch LDABatch objects.

Getter

getTopics Getter for LDA objects.
getJob Getter for LDARep and LDABatch objects.
getSimilarity Getter for TopicSimilarity objects.
getSCLOP Getter for PrototypeLDA objects.
getPrototype Determine the Prototype LDA.

Performing multiple LDAs

LDARep Performing multiple LDAs locally (using parallelization).
LDABatch Performing multiple LDAs on Batch Systems.

Calculation Steps (Workflow) to determine the Prototype LDA

mergeTopics Merge topic matrices from multiple LDAs.
jaccardTopics Calculate topic similarities using the Jaccard coefficient (see Similarity Measures for other possible measures).
dendTopics Create a dendrogram from topic similarities.
SCLOP Determine various S-CLOP values.
pruneSCLOP Prune TopicDendrogram objects.

Similarity Measures

cosineTopics Cosine Similarity.
jaccardTopics Jaccard Coefficient.
jsTopics Jensen-Shannon Divergence.
rboTopics rank-biased overlap.

Shortcuts

getPrototype Shortcut which includes all calculation steps.
LDAPrototype Shortcut which performs multiple LDAs and determines their Prototype.

Author(s)

Maintainer: Jonas Rieger jonas.rieger@tu-dortmund.de (ORCID)

References

Rieger, Jonas (2020). "ldaPrototype: A method in R to get a Prototype of multiple Latent Dirichlet Allocations". Journal of Open Source Software, 5(51), 2181, doi: 10.21105/joss.02181.

Rieger, Jonas, Jörg Rahnenführer and Carsten Jentsch (2020). "Improving Latent Dirichlet Allocation: On Reliability of the Novel Method LDAPrototype". In: Natural Language Processing and Information Systems, NLDB 2020. LNCS 12089, pp. 118–125, doi: 10.1007/978-3-030-51310-8_11.

Rieger, Jonas, Carsten Jentsch and Jörg Rahnenführer (2022). "LDAPrototype: A Model Selection Algorithm to Improve Reliability of Latent Dirichlet Allocation". Preprint on Research Square, doi: 10.21203/rs.3.rs-1486359/v1.

See Also

Useful links:


JonasRieger/ldaPrototype documentation built on Feb. 5, 2023, 6:45 p.m.