Description Usage Arguments Details Value Examples
A wrapper for two implementations of Latent Dirichlet Allocation that returns a nicely-formatted topic model. See details, below.
1 2 |
dtm |
A document term matrix of class |
k |
Number of topics |
iterations |
The number of Gibbs iterations if |
alpha |
Dirichlet parameter for the distribution of topics over documents. Defaults to 0.1 |
beta |
Dirichlet parameter for the distribution of words over topics. Defaults to 0.05 |
smooth |
Logical indicating whether or not you want to smooth the
probabilities in the rows of |
method |
One of either 'gibbs' or 'vem' for either Gibbs sampling or variational expectation maximization. Defaults to 'gibbs'. See details, below. |
return_all |
Logical. Do you want the raw results of the underlying
function returned along with the formatted results? Defaults to |
... |
Other arguments to pass to underlying functions. See details, below. |
For method = 'gibbs' this is a wrapper for lda.collapsed.gibbs.sampler
from the lda package. Additional arguments can be passed to
lda.collapsed.gibbs.sampler through .... However, there are some
arguments that, if passed through ..., can cause conflicts. The
arguments K, alpha, and eta for
lda.collapsed.gibbs.sampler are set with the arguments k,
alpha, and beta, respectively. The arguments documents
and vocab for lda.collapsed.gibbs.sampler are set by dtm
and aren't required.
For method = 'vem', this function is a wrapper for LDA from the
topicmodels library. Arguments to LDA's control
argument are passed through .... LDA, by default, has behavior
worth noting. By default, it estimates alpha and beta as part
of the expectation maximization. Therefore, the values of alpha and
beta passed to LDA will change unless estimate.alpha and
estimate.beta are passed to ... and set to FALSE.
The ... argument can also be used to control the underlying behavior of
TmParallelApply, such as the number of cpus, for example.
Returns a list with a minumum of two objects, phi and
theta. The rows of phi index topics and the columns index tokens.
The rows of theta index documents and the columns index topics.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | # Load a pre-formatted dtm
data(nih_sample_dtm)
# Fit an LDA model on a sample of documents
model <- FitLdaModel(dtm = nih_sample_dtm[ sample(1:nrow(nih_sample_dtm), 20), ],
k = 5, iterations = 200)
str(model)
# Fit a model, include likelihoods passed to lda::lda.collapsed.gibbs.sampler
model <- FitLdaModel(dtm = nih_sample_dtm[ sample(1:nrow(nih_sample_dtm), 20), ],
k = 5, iterations = 200, compute.log.likelihood = TRUE)
str(model)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.