FitLdaModel: Fit a topic model using Latent Dirichlet Allocation

Description Usage Arguments Details Value Examples

Description

A wrapper for two implementations of Latent Dirichlet Allocation that returns a nicely-formatted topic model. See details, below.

Usage

1
2
FitLdaModel(dtm, k, iterations = NULL, alpha = 0.1, beta = 0.05,
  smooth = TRUE, method = "gibbs", return_all = FALSE, ...)

Arguments

dtm

A document term matrix of class dgCMatrix

k

Number of topics

iterations

The number of Gibbs iterations if method = 'gibbs'

alpha

Dirichlet parameter for the distribution of topics over documents. Defaults to 0.1

beta

Dirichlet parameter for the distribution of words over topics. Defaults to 0.05

smooth

Logical indicating whether or not you want to smooth the probabilities in the rows of phi and theta.

method

One of either 'gibbs' or 'vem' for either Gibbs sampling or variational expectation maximization. Defaults to 'gibbs'. See details, below.

return_all

Logical. Do you want the raw results of the underlying function returned along with the formatted results? Defaults to TRUE.

...

Other arguments to pass to underlying functions. See details, below.

Details

For method = 'gibbs' this is a wrapper for lda.collapsed.gibbs.sampler from the lda package. Additional arguments can be passed to lda.collapsed.gibbs.sampler through .... However, there are some arguments that, if passed through ..., can cause conflicts. The arguments K, alpha, and eta for lda.collapsed.gibbs.sampler are set with the arguments k, alpha, and beta, respectively. The arguments documents and vocab for lda.collapsed.gibbs.sampler are set by dtm and aren't required.

For method = 'vem', this function is a wrapper for LDA from the topicmodels library. Arguments to LDA's control argument are passed through .... LDA, by default, has behavior worth noting. By default, it estimates alpha and beta as part of the expectation maximization. Therefore, the values of alpha and beta passed to LDA will change unless estimate.alpha and estimate.beta are passed to ... and set to FALSE.

The ... argument can also be used to control the underlying behavior of TmParallelApply, such as the number of cpus, for example.

Value

Returns a list with a minumum of two objects, phi and theta. The rows of phi index topics and the columns index tokens. The rows of theta index documents and the columns index topics.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Load a pre-formatted dtm 
data(nih_sample_dtm) 

# Fit an LDA model on a sample of documents
model <- FitLdaModel(dtm = nih_sample_dtm[ sample(1:nrow(nih_sample_dtm), 20), ], 
                     k = 5, iterations = 200)

str(model)

# Fit a model, include likelihoods passed to lda::lda.collapsed.gibbs.sampler
model <- FitLdaModel(dtm = nih_sample_dtm[ sample(1:nrow(nih_sample_dtm), 20), ], 
                     k = 5, iterations = 200, compute.log.likelihood = TRUE)

str(model)

ChengMengli/topic documentation built on May 31, 2019, 8:44 p.m.