Description Usage Arguments Details Value
View source: R/topic-analysis.R
topic_frequencies
summarizes the shares of topics in a chosen time
interval as per provided topic shares by document and date.
1 2 | topic_frequencies(topicsByDocDate, timeBinUnit = "week",
minGamma = 0.01, minTopicTimeBins = 0.5)
|
topicsByDocDate |
a dataframe as returned by
|
timeBinUnit |
a character sequence specifying the time period that
should be used as a bin unit when computing topic share frequencies. Valid
values are |
minGamma |
the minimum share of a topic per document to be considered
when summarizing topic frequencies, topics with smaller shares per
individual document will be ignored when computing topic frequencies. (In
an |
minTopicTimeBins |
a double in the range |
A stm
topic model provides for each document the likelihood
(gamma) that it is generated from a specific topic; here we interprete
these as the share of a document attributed to this topic and then summarize
these shares per timebin to obtain the share of a topic across all documents
over time.
The topic share or likelihood per document has to be above a threshold
specified by minGamma
. A suitable threshold might consider the number
of topics and the average document size. An additional filtering option is
provided with minTopicTimeBins
.
Timebins for which no occurrence of a given topic is recorded are added with an explicit value of zero, excluding however such empty timebins before the first occurrence of a topic and after the last.
a dataframe with term frequencies by chosen timebin, where:
a topic ID as provided as an input in
topicsByDocDate
the floor date of a timebin; if
timeBinUnit
was set to week
, this date will always be a
Monday
the median of likelihoods of the topic with
topic_id
in timebin
the mean of
likelihoods of the topic with topic_id
in timebin
the share of topic with topic_id
relative to all
topic shares recorded and included in a given timebin
.
NOTE: strictly speaking these are the likelihoods that a document
is generated from a topic, which we here interpret as the share of a
document attributed to a topic.
the total number of
documents in a dataset in which a topic with topic_id
occurs as
least with likelihood minGamma
the exact date of
the first occurrence of a topic with topic_id
across the whole time
range covered by timebin
s
the exact date of the
latest occurrence of a topic with topic_id
across the whole time
range covered by timebin
s; note that this date can be larger than
the maximum timebin
, as timebin
specifies the floor date of a
time unit
the number of unique timebin
s in
a topic with topic_id
occurs at least with likelihood
minGamma
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.