mallet-package: An R Wrapper for the Java Mallet Topic Modeling Toolkit

mallet-packageR Documentation

An R Wrapper for the Java Mallet Topic Modeling Toolkit

Description

An R interface for the Java Machine Learning for Language Toolkit (mallet) <http://mallet.cs.umass.edu/> to estimate probabilistic topic models, such as Latent Dirichlet Allocation. We can use the R package to read textual data into mallet from R objects, run the Java implementation of mallet directly in R, and extract results as R objects. The Mallet toolkit has many functions, this wrapper focuses on the topic modeling sub-package written by David Mimno. The package uses the rJava package to connect to a JVM.

References

The model, Latent Dirichlet allocation (LDA): David M Blei, Andrew Ng, Michael Jordan. Latent Dirichlet Allocation. J. of Machine Learning Research, 2003.

The Java toolkit: Andrew Kachites McCallum. The Mallet Toolkit. 2002.

Details of the fast sparse Gibbs sampling algorithm: Limin Yao, David Mimno, Andrew McCallum. Streaming Inference for Latent Dirichlet Allocation. KDD, 2009.

Hyperparameter optimization: Hanna Wallach, David Mimno, Andrew McCallum. Rethinking LDA: Why Priors Matter. NIPS, 2010.


mallet documentation built on July 20, 2022, 5:08 p.m.