An R wrapper for the Mallet topic modeling package

Description

This package provides an interface to the Java implementation of latent Dirichlet allocation in the Mallet machine learning package. Mallet has many functions, this wrapper focuses on the topic modeling sub-package written by David Mimno. The package uses the rJava package to connect to a JVM.

Details

Package: mallet
Type: Package
Version: 1.0
Date: 2013-08-08
License: MIT

Create a topic model trainer: MalletLDA

Load documents from disk and import them: mallet.read.dir mallet.import

Get info about word frequencies: mallet.word.freqs

Get trained model parameters: mallet.doc.topics mallet.topic.words mallet.subset.topic.words

Reports on topic words: mallet.top.words mallet.topic.labels

Clustering of topics: mallet.topic.hclust

Author(s)

Maintainer: David Mimno

References

The model, Latent Dirichlet allocation (LDA): David M Blei, Andrew Ng, Michael Jordan. Latent Dirichlet Allocation. J. of Machine Learning Research, 2003.

The Java toolkit: Andrew Kachites McCallum. The Mallet Toolkit. 2002.

Details of the fast sparse Gibbs sampling algorithm: Limin Yao, David Mimno, Andrew McCallum. Streaming Inference for Latent Dirichlet Allocation. KDD, 2009.

Hyperparameter optimization: Hanna Wallach, David Mimno, Andrew McCallum. Rethinking LDA: Why Priors Matter. NIPS, 2010.