LDA: Create a Latent Dirichlet Allocation model

View source: R/models.R

LDAR Documentation

Create a Latent Dirichlet Allocation model


This function initialize a Latent Dirichlet Allocation model.


LDA(x, K = 5, alpha = 1, beta = 0.01)



tokens object containing the texts. A coercion will be attempted if x is not a tokens.


the number of topics


the hyperparameter of topic-document distribution


the hyperparameter of vocabulary distribution


The rJST.LDA methods enable the transition from a previously estimated LDA model to a sentiment-aware rJST model. The function retains the previously estimated topics and randomly assigns sentiment to every word of the corpus. The new model will retain the iteration count of the initial LDA model.


An S3 list containing the model parameter and the estimated mixture. This object corresponds to a Gibbs sampler estimator with zero iterations. The MCMC can be iterated using the grow() function.

  • tokens is the tokens object used to create the model

  • vocabulary contains the set of words of the corpus

  • it tracks the number of Gibbs sampling iterations

  • za is the list of topic assignment, aligned to the tokens object with padding removed

  • logLikelihood returns the measured log-likelihood at each iteration, with a breakdown of the likelihood into hierarchical components as attribute

The topWords() function easily extract the most probables words of each topic/sentiment.


Olivier Delmarcelle


Blei, D.M., Ng, A.Y. and Jordan, M.I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3, 993–1022.

See Also

Growing a model: grow(), extracting top words: topWords()

Other topic models: JST(), rJST(), sentopicmodel()


# creating a model
LDA(ECB_press_conferences_tokens, K = 5, alpha = 0.1, beta = 0.01)

# estimating an LDA model
lda <- LDA(ECB_press_conferences_tokens)
lda <- grow(lda, 100)

sentopics documentation built on May 31, 2023, 8:26 p.m.