Description Usage Arguments Value Author(s) References See Also Examples
This function is a wrapper for MADlib's Latent Dirichlet Allocation. The computation is parallelized by MADlib if the connected database is distributed. Please refer to MADlib documentation for details of the algorithm implementation [1].
1 2 | madlib.lda(data, topic_num, alpha, beta, iter_num = 20,
nstart = 1, best = TRUE,...)
|
data |
An object of |
topic_num |
Number of topics. |
alpha |
Dirichlet parameter for the per-doc topic multinomial. |
beta |
Dirichlet parameter for the per-topic word multinomial. |
iter_num |
Number of iterations. |
nstart |
Number of repeated random starts. |
best |
If TRUE only the model with the minimum perplexity is returned. |
... |
Other optional parameters. Not implemented. |
An lda.madlib
object or a list of them, which is a list that
contains the following items:
assignments |
The per-document topic assignments. |
document_sums |
The per-document topic counts. |
model_table |
The |
output_table |
The |
tf_table |
The |
topic_sums |
The per-topic sum of assignments. |
topics |
The per-word association with topics. |
Author: Predictive Analytics Team at Pivotal Inc.
Maintainer: Frank McQuillan, Pivotal Inc. fmcquillan@pivotal.io
[1] Documentation of LDA in the latest MADlib release, https://madlib.apache.org/docs/latest/group__grp__lda.html
predict.lda.madlib
is used for prediction-labelling test documents
using a learned lda.madlib
model.
perplexity.lda.madlib
is used for computing the perplexity of a
learned lda.madlib
model.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | ## Not run:
## set up the database connection
## Assume that .port is port number and .dbname is the database name
cid <- db.connect(port = .port, dbname = .dbname, verbose = FALSE)
dat <- db.data.frame("__madlib_pivotalr_lda_data__", conn.id = cid,
verbose = FALSE)
output.db <- madlib.lda(dat, 2,0.1,0.1, 50)
perplexity.db <- perplexity.lda.madlib(output.db)
print(perplexity.db)
## Run LDA multiple times and get the best one
output.db <- madlib.lda(dat, 2,0.1,0.1, 50, nstart=2)
perplexity.db <- perplexity.lda.madlib(output.db)
print(perplexity.db)
## Run LDA multiple times and keep all models
output.db <- madlib.lda(dat, 2,0.1,0.1, 50, nstart=2, best=FALSE)
perplexity.db <- perplexity.lda.madlib(output.db[[1]])
print(perplexity.db)
perplexity.db <- perplexity.lda.madlib(output.db[[2]])
print(perplexity.db)
db.disconnect(cid)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.