gensim_input: Generate Gensim input from R.

dtm_as_bowR Documentation

Generate Gensim input from R.

Description

Generate Gensim input from R.

Usage

dtm_as_bow(dtm)

dtm_as_dictionary(dtm)

Arguments

dtm

A 'DocumentTermMatrix'.

Details

The input to gensim's LDA modelling methods is a representation of corpora in a data format denoted as "BOW". This utility function 'dtm_as_bow()' turns a sparse matrix (class 'simple_triplet_matrix') into the bow input format required by gensim.

Author(s)

Andreas Blaette

Examples

if (requireNamespace("reticulate") && reticulate::py_module_available("gensim")){
  library(polmineR)
  use("RcppCWB", corpus = "REUTERS")

  dtm <- corpus("REUTERS") %>%
    split(s_attribute = "id") %>%
    as.DocumentTermMatrix(p_attribute = "word", verbose = FALSE)
  
  bow <- dtm_as_bow(dtm)
  dict <- dtm_as_dictionary(dtm)
}   

PolMine/biglda documentation built on Feb. 25, 2023, 11:24 p.m.