LDAgen: Function to fit LDA model

Description Usage Arguments Value References See Also Examples

View source: R/LDAgen.R

Description

This function uses the lda.collapsed.gibbs.sampler from the lda- package and additionally saves topword lists and a R workspace.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
LDAgen(
  documents,
  K = 100L,
  vocab,
  num.iterations = 200L,
  burnin = 70L,
  alpha = NULL,
  eta = NULL,
  seed = NULL,
  folder = file.path(tempdir(), "lda-result"),
  num.words = 50L,
  LDA = TRUE,
  count = FALSE
)

Arguments

documents

A list prepared by LDAprep.

K

Number of topics

vocab

Character vector containing the words in the corpus

num.iterations

Number of iterations for the gibbs sampler

burnin

Number of iterations for the burnin

alpha

Hyperparameter for the topic proportions

eta

Hyperparameter for the word distributions

seed

A seed for reproducability.

folder

File for the results. Saves in the temporary directionary by default.

num.words

Number of words in the top topic words list

LDA

logical: Should a new model be fitted or an existing R workspace?

count

logical: Should article counts calculated per top topic words be used for output as csv (default: FALSE)?

Value

A .csv file containing the topword list and a R workspace containing the result data.

References

Blei, David M. and Ng, Andrew and Jordan, Michael. Latent Dirichlet allocation. Journal of Machine Learning Research, 2003.

Jonathan Chang (2012). lda: Collapsed Gibbs sampling methods for topic models.. R package version 1.3.2. http://CRAN.R-project.org/package=lda

See Also

Documentation for the lda package.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
texts <- list(A="Give a Man a Fish, and You Feed Him for a Day.
Teach a Man To Fish, and You Feed Him for a Lifetime",
B="So Long, and Thanks for All the Fish",
C="A very able manipulative mathematician, Fisher enjoys a real mastery
in evaluating complicated multiple integrals.")

corpus <- textmeta(meta=data.frame(id=c("A", "B", "C", "D"),
title=c("Fishing", "Don't panic!", "Sir Ronald", "Berlin"),
date=c("1885-01-02", "1979-03-04", "1951-05-06", "1967-06-02"),
additionalVariable=1:4, stringsAsFactors=FALSE), text=texts)

corpus <- cleanTexts(corpus)
wordlist <- makeWordlist(corpus$text)
ldaPrep <- LDAprep(text=corpus$text, vocab=wordlist$words)

LDAgen(documents=ldaPrep, K = 3L, vocab=wordlist$words, num.words=3)

tosca documentation built on Oct. 28, 2021, 5:07 p.m.