predict.lda_topic_model: Get predictions from a Latent Dirichlet Allocation model

Description Usage Arguments Value Examples

View source: R/topic_modeling_core.R

Description

Obtains predictions of topics for new documents from a fitted LDA model

Usage

1
2
3
4
5
6
7
8
9
## S3 method for class 'lda_topic_model'
predict(
  object,
  newdata,
  method = c("gibbs", "dot"),
  iterations = NULL,
  burnin = -1,
  ...
)

Arguments

object

a fitted object of class lda_topic_model

newdata

a DTM or TCM of class dgCMatrix or a numeric vector

method

one of either "gibbs" or "dot". If "gibbs" Gibbs sampling is used and iterations must be specified.

iterations

If method = "gibbs", an integer number of iterations for the Gibbs sampler to run. A future version may include automatic stopping criteria.

burnin

If method = "gibbs", an integer number of burnin iterations. If burnin is greater than -1, the entries of the resulting "theta" matrix are an average over all iterations greater than burnin.

...

Other arguments to be passed to TmParallelApply

Value

a "theta" matrix with one row per document and one column per topic

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
## Not run: 
# load some data
data(nih_sample_dtm)

# fit a model 
set.seed(12345)

m <- FitLdaModel(dtm = nih_sample_dtm[1:20,], k = 5,
                 iterations = 200, burnin = 175)

str(m)

# predict on held-out documents using gibbs sampling "fold in"
p1 <- predict(m, nih_sample_dtm[21:100,], method = "gibbs",
              iterations = 200, burnin = 175)

# predict on held-out documents using the dot product method
p2 <- predict(m, nih_sample_dtm[21:100,], method = "dot")

# compare the methods
barplot(rbind(p1[1,],p2[1,]), beside = TRUE, col = c("red", "blue")) 

## End(Not run)

Example output

Loading required package: Matrix

Attaching package: 'textmineR'

The following object is masked from 'package:Matrix':

    update

The following object is masked from 'package:stats':

    update

List of 7
 $ phi      : num [1:5, 1:5210] 5.77e-05 6.69e-05 5.73e-05 6.69e-05 5.28e-05 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:5] "t_1" "t_2" "t_3" "t_4" ...
  .. ..$ : chr [1:5210] "folding" "tosuprttedprtmnt" "importation" "hd" ...
 $ theta    : num [1:20, 1:5] 0.00043 0.23802 0.111462 0.210652 0.000615 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:20] "8693991" "8693362" "8607498" "8697008" ...
  .. ..$ : chr [1:5] "t_1" "t_2" "t_3" "t_4" ...
 $ gamma    : num [1:5, 1:5210] 0.188 0.19 0.207 0.206 0.209 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:5] "t_1" "t_2" "t_3" "t_4" ...
  .. ..$ : chr [1:5210] "folding" "tosuprttedprtmnt" "importation" "hd" ...
 $ data     :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  .. ..@ i       : int [1:2864] 13 16 11 16 18 10 11 11 17 11 ...
  .. ..@ p       : int [1:5211] 0 0 0 0 1 1 2 2 2 2 ...
  .. ..@ Dim     : int [1:2] 20 5210
  .. ..@ Dimnames:List of 2
  .. .. ..$ : chr [1:20] "8693991" "8693362" "8607498" "8697008" ...
  .. .. ..$ : chr [1:5210] "folding" "tosuprttedprtmnt" "importation" "hd" ...
  .. ..@ x       : num [1:2864] 1 1 1 1 1 1 1 1 1 1 ...
  .. ..@ factors : list()
 $ alpha    : Named num [1:5] 0.1 0.1 0.1 0.1 0.1
  ..- attr(*, "names")= chr [1:5] "t_1" "t_2" "t_3" "t_4" ...
 $ beta     : Named num [1:5210] 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 ...
  ..- attr(*, "names")= chr [1:5210] "folding" "tosuprttedprtmnt" "importation" "hd" ...
 $ coherence: Named num [1:5] 0.0817 0.395 0.24 0.0267 0.04
  ..- attr(*, "names")= chr [1:5] "t_1" "t_2" "t_3" "t_4" ...
 - attr(*, "class")= chr "lda_topic_model"

textmineR documentation built on June 28, 2021, 9:08 a.m.