Description Usage Arguments Details Value Examples
View source: R/topic_modeling_core.R
Fit a Latent Dirichlet Allocation topic model using collapsed Gibbs sampling.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
dtm |
A document term matrix or term co-occurrence matrix of class dgCMatrix |
k |
Integer number of topics |
iterations |
Integer number of iterations for the Gibbs sampler to run. A future version may include automatic stopping criteria. |
burnin |
Integer number of burnin iterations. If |
alpha |
Vector of length |
beta |
Vector of length |
optimize_alpha |
Logical. Do you want to optimize alpha every 10 Gibbs iterations?
Defaults to |
calc_likelihood |
Do you want to calculate the likelihood every 10 Gibbs iterations?
Useful for assessing convergence. Defaults to |
calc_coherence |
Do you want to calculate probabilistic coherence of topics
after the model is trained? Defaults to |
calc_r2 |
Do you want to calculate R-squared after the model is trained?
Defaults to |
... |
Other arguments to be passed to |
EXPLAIN IMPLEMENTATION DETAILS
Returns an S3 object of class c("LDA", "TopicModel"). DESCRIBE MORE
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | # load some data
data(nih_sample_dtm)
# fit a model
set.seed(12345)
m <- FitLdaModel(dtm = nih_sample_dtm[1:20,], k = 5,
iterations = 200, burnin = 175)
str(m)
# predict on held-out documents using gibbs sampling "fold in"
p1 <- predict(m, nih_sample_dtm[21:100,], method = "gibbs",
iterations = 200, burnin = 175)
# predict on held-out documents using the dot product method
p2 <- predict(m, nih_sample_dtm[21:100,], method = "dot")
# compare the methods
barplot(rbind(p1[1,],p2[1,]), beside = TRUE, col = c("red", "blue"))
|
Loading required package: Matrix
Attaching package: 'textmineR'
The following object is masked from 'package:Matrix':
update
The following object is masked from 'package:stats':
update
List of 7
$ phi : num [1:5, 1:5210] 5.77e-05 6.69e-05 5.73e-05 6.69e-05 5.28e-05 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:5] "t_1" "t_2" "t_3" "t_4" ...
.. ..$ : chr [1:5210] "folding" "tosuprttedprtmnt" "importation" "hd" ...
$ theta : num [1:20, 1:5] 0.00043 0.23802 0.111462 0.210652 0.000615 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:20] "8693991" "8693362" "8607498" "8697008" ...
.. ..$ : chr [1:5] "t_1" "t_2" "t_3" "t_4" ...
$ gamma : num [1:5, 1:5210] 0.188 0.19 0.207 0.206 0.209 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:5] "t_1" "t_2" "t_3" "t_4" ...
.. ..$ : chr [1:5210] "folding" "tosuprttedprtmnt" "importation" "hd" ...
$ data :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
.. ..@ i : int [1:2864] 13 16 11 16 18 10 11 11 17 11 ...
.. ..@ p : int [1:5211] 0 0 0 0 1 1 2 2 2 2 ...
.. ..@ Dim : int [1:2] 20 5210
.. ..@ Dimnames:List of 2
.. .. ..$ : chr [1:20] "8693991" "8693362" "8607498" "8697008" ...
.. .. ..$ : chr [1:5210] "folding" "tosuprttedprtmnt" "importation" "hd" ...
.. ..@ x : num [1:2864] 1 1 1 1 1 1 1 1 1 1 ...
.. ..@ factors : list()
$ alpha : Named num [1:5] 0.1 0.1 0.1 0.1 0.1
..- attr(*, "names")= chr [1:5] "t_1" "t_2" "t_3" "t_4" ...
$ beta : Named num [1:5210] 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 ...
..- attr(*, "names")= chr [1:5210] "folding" "tosuprttedprtmnt" "importation" "hd" ...
$ coherence: Named num [1:5] 0.0817 0.395 0.24 0.0267 0.04
..- attr(*, "names")= chr [1:5] "t_1" "t_2" "t_3" "t_4" ...
- attr(*, "class")= chr "lda_topic_model"
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.