model_lda: Latent Dirichlet Allocation Model

Description Usage Arguments Details Functions Examples

View source: R/models.R

Description

Transformation from bag-of-words counts into a topic space of lower dimensionality. LDA is a probabilistic extension of LSA (also called multinomial PCA), so LDA’s topics can be interpreted as probability distributions over words. These distributions are, just like with LSA, inferred automatically from a training corpus. Documents are in turn interpreted as a (soft) mixture of these topics (again, just like with LSA).

Usage

1
2
3
4
5
6
7

Arguments

corpus

Model as returned by mmcorpus_serialize.

...

Any other options, from the official documentation of model_lda or the official documentation of model_ldamc.

file

Path to a saved model.

Details

Target dimensionality (num_topics) of 200–500 is recommended as a “golden standard” https://dl.acm.org/citation.cfm?id=1458105.

Functions

Examples

1
2
3
4
5
6
7
8
9
docs <- prepare_documents(corpus)
dictionary <- corpora_dictionary(docs)
corpora <- doc2bow(dictionary, docs)
corpus_mm <- serialize_mmcorpus(corpora, auto_delete = FALSE)

# fit model
lda <- model_lda(corpus_mm, id2word = dictionary, num_topics = 2L)
lda_topics <- lda$get_document_topics(corpora)
get_docs_topics(lda_topics)

news-r/gensimr documentation built on Jan. 9, 2021, 5:55 a.m.