Description Usage Format Fields Usage Methods Arguments Examples

Creates Latent Dirichlet Allocation model. At the moment only 'WarpLDA' is implemented. WarpLDA, an LDA sampler which achieves both the best O(1) time complexity per token and the best O(K) scope of random access. Our empirical results in a wide range of testing conditions demonstrate that WarpLDA is consistently 5-15x faster than the state-of-the-art Metropolis-Hastings based LightLDA, and is comparable or faster than the sparsity aware F+LDA.

1 2 3 4 5 |

`R6Class`

object.

`topic_word_distribution`

distribution of words for each topic. Available after model fitting with

`model$fit_transform()`

method.`components`

unnormalized word counts for each topic-word entry. Available after model fitting with

`model$fit_transform()`

method.

For usage details see **Methods, Arguments and Examples** sections.

1 2 3 4 | ```
lda = LDA$new(n_topics = 10L, doc_topic_prior = 50 / n_topics, topic_word_prior = 1 / n_topics)
lda$fit_transform(x, n_iter = 1000, convergence_tol = 1e-3, n_check_convergence = 10, progressbar = interactive())
lda$transform(x, n_iter = 1000, convergence_tol = 1e-3, n_check_convergence = 5, progressbar = FALSE)
lda$get_top_words(n = 10, topic_number = 1L:private$n_topics, lambda = 1)
``` |

`$new(n_topics, doc_topic_prior = 50 / n_topics, # alpha topic_word_prior = 1 / n_topics, # beta method = "WarpLDA")`

Constructor for LDA model. For description of arguments see

**Arguments**section.`$fit_transform(x, n_iter, convergence_tol = -1, n_check_convergence = 0, progressbar = interactive())`

fit LDA model to input matrix

`x`

and transforms input documents to topic space. Result is a matrix where each row represents corresponding document. Values in a row form distribution over topics.`$transform(x, n_iter, convergence_tol = -1, n_check_convergence = 0, progressbar = FALSE)`

transforms new documents into topic space. Result is a matrix where each row is a distribution of a documents over latent topic space.

`$get_top_words(n = 10, topic_number = 1L:private$n_topics, lambda = 1)`

returns "top words" for a given topic (or several topics). Words for each topic can be sorted by probability of chance to observe word in a given topic (

`lambda = 1`

) and by "relevance" which also takes into account frequency of word in corpus (`lambda < 1`

). From our experience in most cases setting`0.2 < lambda < 0.4`

works well. See http://nlp.stanford.edu/events/illvi2014/papers/sievert-illvi2014.pdf for details.`$plot(lambda.step = 0.1, reorder.topics = FALSE, ...)`

plot LDA model using https://cran.r-project.org/package=LDAvis package.

`...`

will be passed to`LDAvis::createJSON`

and`LDAvis::serVis`

functions

- lda
A

`LDA`

object- x
An input document-term matrix (should have column names = terms).

**CSR**, other formats will be tried to convert to CSR via`RsparseMatrix`

used internally`as()`

function call.- n_topics
`integer`

desired number of latent topics. Also knows as**K**- doc_topic_prior
`numeric`

prior for document-topic multinomial distribution. Also knows as**alpha**- topic_word_prior
`numeric`

prior for topic-word multinomial distribution. Also knows as**eta**- n_iter
`integer`

number of sampling iterations- n_check_convergence
defines how often calculate score to check convergence

- convergence_tol
`numeric = -1`

defines early stopping strategy. We stop fitting when one of two following conditions will be satisfied: (a) we have used all iterations, or (b)`score_previous_check / score_current < 1 + convergence_tol`

1 2 3 4 5 6 7 8 9 10 11 12 | ```
library(text2vec)
data("movie_review")
N = 500
tokens = word_tokenizer(tolower(movie_review$review[1:N]))
it = itoken(tokens, ids = movie_review$id[1:N])
v = create_vocabulary(it)
v = prune_vocabulary(v, term_count_min = 5, doc_proportion_max = 0.2)
dtm = create_dtm(it, vocab_vectorizer(v))
lda_model = LDA$new(n_topics = 10)
doc_topic_distr = lda_model$fit_transform(dtm, n_iter = 20)
# run LDAvis visualisation if needed (make sure LDAvis package installed)
# lda_model$plot()
``` |

```
INFO [2018-05-23 15:32:27] iter 10 loglikelihood = -255460.809
INFO [2018-05-23 15:32:27] iter 20 loglikelihood = -249711.034
```

text2vec documentation built on Jan. 12, 2018, 1:04 a.m.

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.