updateRollingLDA: Updating an existing RollingLDA object

Description Usage Arguments Details Value See Also Examples

View source: R/updateRollingLDA.R

Description

Performs an update of an existing object consisting of a rolling version of Latent Dirichlet Allocation.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
updateRollingLDA(
  x,
  texts,
  dates,
  chunks,
  memory,
  param = getParam(x),
  compute.topics = TRUE,
  memory.fallback = 0L,
  ...
)

## S3 method for class 'RollingLDA'
RollingLDA(
  x,
  texts,
  dates,
  chunks,
  memory,
  param = getParam(x),
  compute.topics = TRUE,
  memory.fallback = 0L,
  ...
)

Arguments

x

[named list]
RollingLDA object.

texts

[named list]
Tokenized texts.

dates

[(un)named Date]
Sorted dates of the tokenized texts. If unnamed, it must match the order of texts.

chunks

[Date or character(1)]
Sorted dates of the beginnings of each chunk to be modeled as updates. If passed as character, dates are determined by passing the minimum of dates as from argument, max(dates) as to argument and chunks as by argument in seq.Date. If not passed, all texts are interpreted as one chunk.

memory

[Date, character(1) or integer(1)]
Dates of the beginnings of each chunk's memory. If passed as character, dates are determined by using the dates of the beginnings of each chunk and substracting the given time interval in memory passing it as by argument in seq.Date. If passed as integer/numeric, the dates are determined by going backwards the modeled texts chronologically and taking the date of the text at position memory.

param

[named list] with entries (Default is getParam(x))

vocab.abs

[integer(1)] An absolute lower bound limit for which words are taken into account. All words are considered in the vocabularies that have a count higher than vocab.abs over all texts and at the same time a higher relative frequency than vocab.rel.

vocab.rel

[0,1] A relative lower bound limit for which words are taken into account. See also vocab.abs.

vocab.fallback

[integer(1)] An absolute lower bound limit for which words are taken into account. All words are considered in the vocabularies that have a count higher than vocab.fallback over all texts even if they might not have a higher relative frequency than vocab.rel.

doc.abs

[integer(1)] An absolute lower bound limit for which texts are taken into account. All texts are considered for modeling that have more words (subsetted to words occurring in the vocabularies) than doc.abs.

compute.topics

[logical(1)]
Should the topic matrix of the LDA model be computed? Default is TRUE.

memory.fallback

[integer(1)]
If there are no texts as memory in a certain chunk, memory is determined by going backwards the modeled texts chronologically and taking the date of the text at position memory.fallback. Default is 0, which means "end the fitting".

...

not implemented

Details

The function uses an existing RollingLDA object and models new texts with a specified memory as initialization of the new LDA chunk.

The function returns a RollingLDA object. You can receive results and all other elements of this object with getter functions (see getChunks).

Value

[named list] with entries

id

[character(1)] See above.

lda

LDA object of the fitted RollingLDA.

docs

[named list] with modeled texts in a preprocessed format. See LDAprep

dates

[named Date] with dates of the modeled texts.

vocab

[character] with the vocabularies considered for modeling.

chunks

[data.table] with specifications for each model chunk.

param

[named list] with parameter specifications for vocab.abs [integer(1)], vocab.rel [0,1], vocab.fallback [integer(1)] and doc.abs [integer(1)]. See above for explanation.

See Also

Other RollingLDA functions: RollingLDA(), as.RollingLDA(), getChunks()

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
roll_lda = RollingLDA(texts = economy_texts[economy_dates < "2008-05-01"],
                      dates = economy_dates[economy_dates < "2008-05-01"],
                      chunks = "month",
                      memory = "month",
                      init = 100,
                      K = 10,
                      type = "lda")

# updateRollingLDA = RollingLDA, if first argument is a RollingLDA object
roll_update = RollingLDA(roll_lda,
                         texts = economy_texts[economy_dates >= "2008-05-01"],
                         dates = economy_dates[economy_dates >= "2008-05-01"],
                         chunks = "month",
                         memory = "month")

roll_update
getChunks(roll_update)

rollinglda documentation built on Oct. 28, 2021, 5:10 p.m.