RollingLDA: RollingLDA
In rollinglda: Construct Consistent Time Series from Textual Data

View source: R/RollingLDA.R

RollingLDA

R Documentation

RollingLDA

Description

Performs a rolling version of Latent Dirichlet Allocation.

Usage

RollingLDA(...)

## Default S3 method:
RollingLDA(
  texts,
  dates,
  chunks,
  memory,
  vocab.abs = 5L,
  vocab.rel = 0,
  vocab.fallback = 100L,
  doc.abs = 0L,
  memory.fallback = 0L,
  init,
  type = c("ldaprototype", "lda"),
  id,
  ...
)

Arguments

`...`	additional arguments passed to `LDARep` or `LDAPrototype`, respectively. Default parameters are `alpha = eta = 1/K` and `num.iterations = 200`. There is no default for `K`.
`texts`	[`named list`] Tokenized texts.
`dates`	[`(un)named Date`] Dates of the tokenized texts. If unnamed, it must match the order of texts.
`chunks`	[`Date` or `character(1)`] Sorted dates of the beginnings of each chunk to be modeled after the initial model. If passed as `character`, dates are determined by passing `init` plus one day as `from` argument, `max(dates)` as `to` argument and `chunks` as `by` argument in `seq.Date`.
`memory`	[`Date`, `character(1)` or `integer(1)`] Sorted dates of the beginnings of each chunk's memory. If passed as `character`, dates are determined by using the dates of the beginnings of each chunk and substracting the given time interval in `memory` passing it as `by` argument in `seq.Date`. If passed as `integer/numeric`, the dates are determined by going backwards the modeled texts chronologically and taking the date of the text at position `memory`.
`vocab.abs`	[`integer(1)`] An absolute lower bound limit for which words are taken into account. All words are considered in the vocabularies that have a count higher than `vocab.abs` over all texts and at the same time a higher relative frequency than `vocab.rel`. Default is 5.
`vocab.rel`	[0,1] A relative lower bound limit for which words are taken into account. See also `vocab.abs`. Default is 0.
`vocab.fallback`	[`integer(1)`] An absolute lower bound limit for which words are taken into account. All words are considered in the vocabularies that have a count higher than `vocab.fallback` over all texts even if they might not have a higher relative frequency than `vocab.rel`. Default is 100.
`doc.abs`	[`integer(1)`] An absolute lower bound limit for which texts are taken into account. All texts are considered for modeling that have more words (subsetted to words occurring in the vocabularies) than `doc.abs`. Default is 0.
`memory.fallback`	[`integer(1)`] If there are no texts as memory in a certain chunk, `memory` is determined by going backwards the modeled texts chronologically and taking the date of the text at position `memory.fallback`. Default is 0, which means "end the fitting".
`init`	[`Date(1)` or `integer(1)`] Date up to which the initial model should be computed. This parameter is needed/used only if `chunks` is passed as `character`. Otherwise the initial model is computed up to the first date in `chunks` minus one day. If `init` is passed as `integer/numeric`, the `init` lowest date from `dates` is selected.
`type`	[`character(1)`] One of "ldaPrototype" or "lda" specifying whether a LDAProtoype or standard LDA should be modeled as initial model. Default is "ldaprototype".
`id`	[`character(1)`] Name for the computation/model.

Details

The function first computes a initial LDA model (using LDARep or LDAPrototype). Afterwards it models temporal chunks of texts with a specified memory for initialization of each model chunk.

The function returns a RollingLDA object. You can receive results and all other elements of this object with getter functions (see getChunks).

Value

[named list] with entries

id: [character(1)] See above.
lda: LDA object of the fitted RollingLDA.
docs: [named list] with modeled texts in a preprocessed format. See LDAprep.
dates: [named Date] with dates of the modeled texts.
vocab: [character] with the vocabularies considered for modeling.
chunks: [data.table] with specifications for each model chunk.
param: [named list] with parameter specifications for vocab.abs [integer(1)], vocab.rel [0,1], vocab.fallback [integer(1)] and doc.abs [integer(1)]. See above for explanation.

Examples

roll_lda = RollingLDA(texts = economy_texts,
                      dates = economy_dates,
                      chunks = "quarter",
                      memory = "3 quarter",
                      init = "2008-07-03",
                      K = 10,
                      type = "lda")

roll_lda
getChunks(roll_lda)
getLDA(roll_lda)


roll_proto = RollingLDA(texts = economy_texts,
                        dates = economy_dates,
                        chunks = "quarter",
                        memory = "3 quarter",
                        init = "2007-07-03",
                        K = 10,
                        n = 12,
                        pm.backend = "socket",
                        ncpus = 2)

roll_proto
getChunks(roll_proto)
getLDA(roll_proto)

rollinglda documentation built on Aug. 21, 2025, 5:54 p.m.