dfrtopics: Tools for exploring topic models of text

word_series_matrix

R Documentation

Aggregate (word/topic) counts by time period

Description

This convenience function transforms a topic-document or term-document matrix into a topic (term) time-period matrix. This is meant for the common application in which document date metadata will be used to generate time series. Values are normalized so that they total to 1 in each time period. Any matrix can be transformed in this way, however, as long as its columns can be matched against date data.

Usage

word_series_matrix(tdm, dates, breaks = "years")

Arguments

`tdm`	a matrix (or Matrix) with some feature (e.g. topics or words) in rows and datable in columns
`dates`	a Date vector, one for each column of `tdm`
`breaks`	passed on to `link[base]`cut.Date (q.v.): what interval should the time series use?

Details

N.B. that though topics are the most obvious row variable and documents are the most obvious column variable, it may also make sense to preaggregate multiple words or topics into some larger construct. Similarly, if the documents can be grouped into aggregates with their own periodicity (e.g. periodical issues), there is no reason not to set tdm to a matrix with columns already summed together. You can of course also do this summing post-hoc, but then it's important to be careful about normalization. Naturally nothing stops you from supplying a slice of the topic-document matrix to study series of proportions within some subset of topics/documents, rather than the whole. Again interpreting normalized proportions will require some care.

Value

A matrix where each row is a time series and each column sums to 1. If you wish to generate a time series without normalization or with rolling means or other smoothing, use the sum_col_groups function in conjunction with cut.Date.

Examples

## Not run: 
# time series within topic 10 of "solid", "flesh", "melt"
# after loading sampling state on model m
sm10 <- tdm_topic(m, 10) %>%
   word_series_matrix(metadata(m)$pubdate) %>%
gather_matrix(sm10[word_ids(c("solid", "flesh", "melt")), ],
              col_names=c("word", "year", "weight"))

## End(Not run)

agoldst/dfrtopics documentation built on July 15, 2022, 4:13 p.m.