as_LDA: Convert Gensim or Mallet LDA to R class

as_LDAR Documentation

Convert Gensim or Mallet LDA to R class

Description

Convert Gensim or Mallet LDA to R class

Usage

as_LDA(x, ...)

## S3 method for class 'jobjRef'
as_LDA(x, beta, gamma, dtm, verbose = TRUE, ...)

## S3 method for class 'gensim.models.ldamodel.LdaModel'
as_LDA(x, beta, gamma, dtm, verbose = TRUE, ...)

Arguments

x

A Gensim or Mallet topic model ('ParallelTopicModel').

...

Further arguments (unused).

beta

A 'matrix' with word-topic distribution that will be assigned to slot 'beta'. If missing (default), the matrix will be derived from the input model. To assign the matrix in a separate step, use empty matrix ('matrix()') as argument.

gamma

A matrix with topic distribution for each document that will be assigned to slot 'gamma'. If missing (default), the matrix will be derived from the input model. To assign the matrix in a separate step, use empty matrix ('matrix()') as argument.

dtm

A Document-Term-Matrix (will be turned into BOW data structure). consists of a set of files starting with the modelname each.

verbose

A 'logical' value, whether to output progress messages.

Details

The 'as_LDA()'-function will turn an estimated topic model prepared using 'mallet' into a 'LDA_Mallet' object that inherits from classes defined in the 'topicmodels' package. This may be useful for using topic model evaluation tools available for the 'LDA_Gibbs' class, but not for the immediate output of malled topicmodelling. Note that the gamma matrix is normalized and smoothed, the beta matrix is the logarithmized matrix of normalized and smoothed values obtained from the input mallet topic model.

Examples

data_dir <- system.file(package = "biglda", "extdata", "mallet")
BTM <- mallet_load_topicmodel(
  instancefile = file.path(data_dir, "instance_list.mallet"),
  statefile = file.path(data_dir, "lda_mallet.gz")
)

LDA <- as_LDA(BTM)

# Avoid memory limitations by preparing beta/gamma matrix separately
LDA2 <- as_LDA(BTM, beta = matrix(), gamma = matrix())

B(LDA2) <- save_word_weights(BTM, minimized = TRUE) |>
  load_word_weights(minimized = TRUE) |>
  exp()
  
G(LDA2) <- save_document_topics(BTM) |>
  load_document_topics()
if (requireNamespace("reticulate") && reticulate::py_module_available("gensim")){
  gensim <- reticulate::import("gensim")
  
  dir <- system.file(package = "biglda", "extdata", "gensim")
  dtmfile <- file.path(dir, "germaparlmini_dtm.rds")
  
  lda <- gensim_ldamodel_load(modeldir = dir, modelname = "germaparlmini") |>
    as_LDA(dtm = readRDS(dtmfile))
    
  topics_terms <- topicmodels::get_terms(lda, 10)
  docs_topics <- topicmodels::get_topics(lda, 5)
}

PolMine/biglda documentation built on Feb. 25, 2023, 11:24 p.m.