model_lda: Latent Dirichlet Allocation Model
In news-r/gensimr: Topic Modelling

Description Usage Arguments Details Functions Examples

Transformation from bag-of-words counts into a topic space of lower dimensionality. LDA is a probabilistic extension of LSA (also called multinomial PCA), so LDA’s topics can be interpreted as probability distributions over words. These distributions are, just like with LSA, inferred automatically from a training corpus. Documents are in turn interpreted as a (soft) mixture of these topics (again, just like with LSA).

model_lda(corpus, ...)

load_lda(file)

model_ldamc(corpus, ...)

load_ldamc(file)

`corpus`	Model as returned by `mmcorpus_serialize`.
`...`	Any other options, from the official documentation of `model_lda` or the official documentation of `model_ldamc`.
`file`	Path to a saved model.

Target dimensionality (num_topics) of 200–500 is recommended as a “golden standard” https://dl.acm.org/citation.cfm?id=1458105.

model_lda - Single-core implementation.
model_ldamc - Multi-core implementation.

docs <- prepare_documents(corpus)
dictionary <- corpora_dictionary(docs)
corpora <- doc2bow(dictionary, docs)
corpus_mm <- serialize_mmcorpus(corpora, auto_delete = FALSE)

# fit model
lda <- model_lda(corpus_mm, id2word = dictionary, num_topics = 2L)
lda_topics <- lda$get_document_topics(corpora)
get_docs_topics(lda_topics)