as_LDA | R Documentation |
Convert Gensim or Mallet LDA to R class
as_LDA(x, ...) ## S3 method for class 'jobjRef' as_LDA(x, beta, gamma, dtm, verbose = TRUE, ...) ## S3 method for class 'gensim.models.ldamodel.LdaModel' as_LDA(x, beta, gamma, dtm, verbose = TRUE, ...)
x |
A Gensim or Mallet topic model ('ParallelTopicModel'). |
... |
Further arguments (unused). |
beta |
A 'matrix' with word-topic distribution that will be assigned to slot 'beta'. If missing (default), the matrix will be derived from the input model. To assign the matrix in a separate step, use empty matrix ('matrix()') as argument. |
gamma |
A matrix with topic distribution for each document that will be assigned to slot 'gamma'. If missing (default), the matrix will be derived from the input model. To assign the matrix in a separate step, use empty matrix ('matrix()') as argument. |
dtm |
A Document-Term-Matrix (will be turned into BOW data structure). consists of a set of files starting with the modelname each. |
verbose |
A 'logical' value, whether to output progress messages. |
The 'as_LDA()'-function will turn an estimated topic model prepared using 'mallet' into a 'LDA_Mallet' object that inherits from classes defined in the 'topicmodels' package. This may be useful for using topic model evaluation tools available for the 'LDA_Gibbs' class, but not for the immediate output of malled topicmodelling. Note that the gamma matrix is normalized and smoothed, the beta matrix is the logarithmized matrix of normalized and smoothed values obtained from the input mallet topic model.
data_dir <- system.file(package = "biglda", "extdata", "mallet") BTM <- mallet_load_topicmodel( instancefile = file.path(data_dir, "instance_list.mallet"), statefile = file.path(data_dir, "lda_mallet.gz") ) LDA <- as_LDA(BTM) # Avoid memory limitations by preparing beta/gamma matrix separately LDA2 <- as_LDA(BTM, beta = matrix(), gamma = matrix()) B(LDA2) <- save_word_weights(BTM, minimized = TRUE) |> load_word_weights(minimized = TRUE) |> exp() G(LDA2) <- save_document_topics(BTM) |> load_document_topics() if (requireNamespace("reticulate") && reticulate::py_module_available("gensim")){ gensim <- reticulate::import("gensim") dir <- system.file(package = "biglda", "extdata", "gensim") dtmfile <- file.path(dir, "germaparlmini_dtm.rds") lda <- gensim_ldamodel_load(modeldir = dir, modelname = "germaparlmini") |> as_LDA(dtm = readRDS(dtmfile)) topics_terms <- topicmodels::get_terms(lda, 10) docs_topics <- topicmodels::get_topics(lda, 5) }
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.