update.lda_topic_model: Update a Latent Dirichlet Allocation topic model with new...

Description Usage Arguments Value Examples

View source: R/topic_modeling_core.R

Description

Update an LDA model with new data using collapsed Gibbs sampling.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
## S3 method for class 'lda_topic_model'
update(
  object,
  dtm,
  additional_k = 0,
  iterations = NULL,
  burnin = -1,
  new_alpha = NULL,
  new_beta = NULL,
  optimize_alpha = FALSE,
  calc_likelihood = FALSE,
  calc_coherence = TRUE,
  calc_r2 = FALSE,
  ...
)

Arguments

object

a fitted object of class lda_topic_model

dtm

A document term matrix or term co-occurrence matrix of class dgCMatrix.

additional_k

Integer number of topics to add, defaults to 0.

iterations

Integer number of iterations for the Gibbs sampler to run. A future version may include automatic stopping criteria.

burnin

Integer number of burnin iterations. If burnin is greater than -1, the resulting "phi" and "theta" matrices are an average over all iterations greater than burnin.

new_alpha

For now not used. This is the prior for topics over documents used when updating the model

new_beta

For now not used. This is the prior for words over topics used when updating the model.

optimize_alpha

Logical. Do you want to optimize alpha every 10 Gibbs iterations? Defaults to FALSE.

calc_likelihood

Do you want to calculate the likelihood every 10 Gibbs iterations? Useful for assessing convergence. Defaults to FALSE.

calc_coherence

Do you want to calculate probabilistic coherence of topics after the model is trained? Defaults to TRUE.

calc_r2

Do you want to calculate R-squared after the model is trained? Defaults to FALSE.

...

Other arguments to be passed to TmParallelApply

Value

Returns an S3 object of class c("LDA", "TopicModel").

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
## Not run: 
# load a document term matrix
d1 <- nih_sample_dtm[1:50,]

d2 <- nih_sample_dtm[51:100,]

# fit a model
m <- FitLdaModel(d1, k = 10, 
                 iterations = 200, burnin = 175,
                 optimize_alpha = TRUE, 
                 calc_likelihood = FALSE,
                 calc_coherence = TRUE,
                 calc_r2 = FALSE)

# update an existing model by adding documents
m2 <- update(object = m,
             dtm = rbind(d1, d2),
             iterations = 200,
             burnin = 175)
             
# use an old model as a prior for a new model
m3 <- update(object = m,
             dtm = d2, # new documents only
             iterations = 200,
             burnin = 175)
             
# add topics while updating a model by adding documents
m4 <- update(object = m,
             dtm = rbind(d1, d2),
             additional_k = 3,
             iterations = 200,
             burnin = 175)
             
# add topics to an existing model
m5 <- update(object = m,
             dtm = d1, # this is the old data
             additional_k = 3,
             iterations = 200,
             burnin = 175)


## End(Not run)

textmineR documentation built on June 28, 2021, 9:08 a.m.