An R wrapper for the Mallet topic modeling package


This package provides an interface to the Java implementation of latent Dirichlet allocation in the Mallet machine learning package. Mallet has many functions, this wrapper focuses on the topic modeling sub-package written by David Mimno. The package uses the rJava package to connect to a JVM.


Package: mallet
Type: Package
Version: 1.0
Date: 2013-08-08
License: MIT

Create a topic model trainer: MalletLDA

Load documents from disk and import them: mallet.import

Get info about word frequencies: mallet.word.freqs

Get trained model parameters: mallet.doc.topics mallet.topic.words mallet.subset.topic.words

Reports on topic words: mallet.topic.labels

Clustering of topics: mallet.topic.hclust


Maintainer: David Mimno


The model, Latent Dirichlet allocation (LDA): David M Blei, Andrew Ng, Michael Jordan. Latent Dirichlet Allocation. J. of Machine Learning Research, 2003.

The Java toolkit: Andrew Kachites McCallum. The Mallet Toolkit. 2002.

Details of the fast sparse Gibbs sampling algorithm: Limin Yao, David Mimno, Andrew McCallum. Streaming Inference for Latent Dirichlet Allocation. KDD, 2009.

Hyperparameter optimization: Hanna Wallach, David Mimno, Andrew McCallum. Rethinking LDA: Why Priors Matter. NIPS, 2010.

Want to suggest features or report bugs for Use the GitHub issue tracker. Vote for new features on Trello.

comments powered by Disqus