LDAgen: Function to fit LDA model
In tosca: Tools for Statistical Content Analysis

LDAgen

R Documentation

Function to fit LDA model

Description

This function uses the lda.collapsed.gibbs.sampler from the lda- package and additionally saves topword lists and a R workspace.

Usage

LDAgen(
  documents,
  K = 100L,
  vocab,
  num.iterations = 200L,
  burnin = 70L,
  alpha = NULL,
  eta = NULL,
  seed = NULL,
  folder = file.path(tempdir(), "lda-result"),
  num.words = 50L,
  LDA = TRUE,
  count = FALSE
)

Arguments

`documents`	A list prepared by `LDAprep`.
`K`	Number of topics
`vocab`	Character vector containing the words in the corpus
`num.iterations`	Number of iterations for the gibbs sampler
`burnin`	Number of iterations for the burnin
`alpha`	Hyperparameter for the topic proportions
`eta`	Hyperparameter for the word distributions
`seed`	A seed for reproducability.
`folder`	File for the results. Saves in the temporary directionary by default.
`num.words`	Number of words in the top topic words list
`LDA`	logical: Should a new model be fitted or an existing R workspace?
`count`	logical: Should article counts calculated per top topic words be used for output as csv (default: `FALSE`)?

Value

A .csv file containing the topword list and a R workspace containing the result data.

References

Blei, David M. and Ng, Andrew and Jordan, Michael. Latent Dirichlet allocation. Journal of Machine Learning Research, 2003.

Jonathan Chang (2012). lda: Collapsed Gibbs sampling methods for topic models.. R package version 1.3.2. http://CRAN.R-project.org/package=lda

Examples

texts <- list(A="Give a Man a Fish, and You Feed Him for a Day.
Teach a Man To Fish, and You Feed Him for a Lifetime",
B="So Long, and Thanks for All the Fish",
C="A very able manipulative mathematician, Fisher enjoys a real mastery
in evaluating complicated multiple integrals.")

corpus <- textmeta(meta=data.frame(id=c("A", "B", "C", "D"),
title=c("Fishing", "Don't panic!", "Sir Ronald", "Berlin"),
date=c("1885-01-02", "1979-03-04", "1951-05-06", "1967-06-02"),
additionalVariable=1:4, stringsAsFactors=FALSE), text=texts)

corpus <- cleanTexts(corpus)
wordlist <- makeWordlist(corpus$text)
ldaPrep <- LDAprep(text=corpus$text, vocab=wordlist$words)

LDAgen(documents=ldaPrep, K = 3L, vocab=wordlist$words, num.words=3)

tosca documentation built on June 8, 2025, 11:21 a.m.