Description Usage Arguments Value See Also Examples
Tidy topic models fit by the stm package. The arguments and return values
are similar to lda_tidiers
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
x |
An STM fitted model object from either |
matrix |
Whether to tidy the beta (per-term-per-topic, default) or gamma/theta (per-document-per-topic) matrix. The stm package calls this the theta matrix, but other topic modeling packages call this gamma. |
log |
Whether beta/gamma/theta should be on a log scale, default FALSE |
document_names |
Optional vector of document names for use with per-document-per-topic tidying |
... |
Extra arguments, not used |
data |
For |
tidy
returns a tidied version of either the beta or gamma matrix if
called on an object from stm
or a tidied version of the estimated regressions
if called on an object from estimateEffect
.
augment
must be provided a data argument, either a
dfm
from quanteda or a table containing one row per original
document-term pair, such as is returned by tdm_tidiers, containing
columns document
and term
. It returns that same data as a table
with an additional column .topic
with the topic assignment for that
document-term combination.
glance
always returns a one-row table, with columns
Number of topics in the model
Number of documents in the model
Number of terms in the model
Number of iterations used
If an LDA model, the parameter of the Dirichlet distribution for topics over documents
If matrix == "beta"
(default), returns a table with one row per topic and term,
with columns
Topic, as an integer
Term
Probability of a term generated from a topic according to the structural topic model
If matrix == "gamma"
, returns a table with one row per topic and document,
with columns
Topic, as an integer
Document name (if given in vector of document_names
) or
ID as an integer
Probability of topic given document
If called on an object from estimateEffect
, returns a table with columns
Topic, as an integer
The term in the model being estimated and tested
The estimated coefficient
The standard error from the linear model
t-statistic
two-sided p-value
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | ## Not run:
if (requireNamespace("stm", quietly = TRUE)) {
library(dplyr)
library(ggplot2)
library(stm)
library(janeaustenr)
austen_sparse <- austen_books() %>%
unnest_tokens(word, text) %>%
anti_join(stop_words) %>%
count(book, word) %>%
cast_sparse(book, word, n)
topic_model <- stm(austen_sparse, K = 12, verbose = FALSE, init.type = "Spectral")
# tidy the word-topic combinations
td_beta <- tidy(topic_model)
td_beta
# Examine the topics
td_beta %>%
group_by(topic) %>%
top_n(10, beta) %>%
ungroup() %>%
ggplot(aes(term, beta)) +
geom_col() +
facet_wrap(~ topic, scales = "free") +
coord_flip()
# tidy the document-topic combinations, with optional document names
td_gamma <- tidy(topic_model, matrix = "gamma",
document_names = rownames(austen_sparse))
td_gamma
# using stm's gardarianFit, we can tidy the result of a model
# estimated with covariates
effects <- estimateEffect(1:3 ~ treatment, gadarianFit, gadarian)
td_estimate <- tidy(effects)
td_estimate
}
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.