topWords | R Documentation |
Extract the top words in each topic/sentiment from a
sentopicmodel
.
topWords(
x,
nWords = 10,
method = c("frequency", "probability", "term-score", "FREX"),
output = c("data.frame", "plot", "matrix"),
subset,
w = 0.5
)
plot_topWords(
x,
nWords = 10,
method = c("frequency", "probability", "term-score", "FREX"),
subset,
w = 0.5
)
x |
a |
nWords |
the number of top words to extract |
method |
specify if a re-ranking function should be applied before returning the top words. See Details for a description of each method. |
output |
determines the output of the function |
subset |
allows to subset using a logical expression, as in |
w |
only used when |
"frequency"
ranks top words according to their frequency
within a topic. This method also reports the overall frequency of
each word. When returning a plot, the overall frequency is
represented with a grey bar.
"probability"
uses the estimated topic-word mixture \phi
to
rank top words.
"term-score"
implements the re-ranking method from Blei and
Lafferty (2009). This method down-weights terms that have high
probability in all topics using the following score:
\text{term-score}_{k,v} = \phi_{k, v}\log\left(\frac{\phi_{k,
v}}{\left(\prod^K_{j=1}\phi_{j,v}\right)^{\frac{1}{K}}}\right),
for
topic k
, vocabulary word v
and number of topics K
.
"FREX"
implements the re-ranking method from Bischof and Airoldi
(2012). This method used the weight w
to balance between
topic-word probability and topic exclusivity using the following
score:
\text{FREX}_{k,v}=\left(\frac{w}{\text{ECDF}\left(
\frac{\phi_{k,v}}{\sum_{j=1}^K\phi_{k,v}}\right)}
+ \frac{1-w}{\text{ECDF}\left(\phi_{k,v}\right)} \right),
for
topic k
, vocabulary word v
, number of topics K
and
weight w
, where \text{ECDF}
is the empirical cumulative
distribution function.
The top words of the topic model. Depending on the output chosen, can
result in either a long-style data.frame, a ggplot2
object or a matrix.
Olivier Delmarcelle
Blei, DM. and Lafferty, JD. (2009). Topic models.. In Text Mining, chapter 4, 101–124.
Bischof JM. and Airoldi, EM. (2012). Summarizing Topical Content with Word Frequency and Exclusivity.. In Proceedings of the 29th International Conference on International Conference on Machine Learning, ICML'12, 9–16.
melt.sentopicmodel()
for extracting estimated mixtures
model <- LDA(ECB_press_conferences_tokens)
model <- fit(model, 10)
topWords(model)
topWords(model, output = "matrix")
topWords(model, method = "FREX")
plot_topWords(model)
plot_topWords(model, subset = topic %in% 1:2)
jst <- JST(ECB_press_conferences_tokens)
jst <- fit(jst, 10)
plot_topWords(jst)
plot_topWords(jst, subset = topic %in% 1:2 & sentiment == 3)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.