Description Usage Arguments Value See Also Examples

Tidy topic models fit by the stm package. The arguments and return values
are similar to `lda_tidiers`

.

1 2 3 4 5 6 7 8 9 10 11 12 |

`x` |
An STM fitted model object from either |

`matrix` |
Whether to tidy the beta (per-term-per-topic, default) or gamma/theta (per-document-per-topic) matrix. The stm package calls this the theta matrix, but other topic modeling packages call this gamma. |

`log` |
Whether beta/gamma/theta should be on a log scale, default FALSE |

`document_names` |
Optional vector of document names for use with per-document-per-topic tidying |

`...` |
Extra arguments, not used |

`data` |
For |

`tidy`

returns a tidied version of either the beta or gamma matrix if
called on an object from `stm`

or a tidied version of the estimated regressions
if called on an object from `estimateEffect`

.

`augment`

must be provided a data argument, either a
`dfm`

from quanteda or a table containing one row per original
document-term pair, such as is returned by tdm_tidiers, containing
columns `document`

and `term`

. It returns that same data as a table
with an additional column `.topic`

with the topic assignment for that
document-term combination.

`glance`

always returns a one-row table, with columns

- k
Number of topics in the model

- docs
Number of documents in the model

- terms
Number of terms in the model

- iter
Number of iterations used

- alpha
If an LDA model, the parameter of the Dirichlet distribution for topics over documents

If `matrix == "beta"`

(default), returns a table with one row per topic and term,
with columns

- topic
Topic, as an integer

- term
Term

- beta
Probability of a term generated from a topic according to the structural topic model

If `matrix == "gamma"`

, returns a table with one row per topic and document,
with columns

- topic
Topic, as an integer

- document
Document name (if given in vector of

`document_names`

) or ID as an integer- gamma
Probability of topic given document

If called on an object from `estimateEffect`

, returns a table with columns

- topic
Topic, as an integer

- term
The term in the model being estimated and tested

- estimate
The estimated coefficient

- std.error
The standard error from the linear model

- statistic
t-statistic

- p.value
two-sided p-value

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | ```
## Not run:
if (requireNamespace("stm", quietly = TRUE)) {
library(dplyr)
library(ggplot2)
library(stm)
library(janeaustenr)
austen_sparse <- austen_books() %>%
unnest_tokens(word, text) %>%
anti_join(stop_words) %>%
count(book, word) %>%
cast_sparse(book, word, n)
topic_model <- stm(austen_sparse, K = 12, verbose = FALSE, init.type = "Spectral")
# tidy the word-topic combinations
td_beta <- tidy(topic_model)
td_beta
# Examine the topics
td_beta %>%
group_by(topic) %>%
top_n(10, beta) %>%
ungroup() %>%
ggplot(aes(term, beta)) +
geom_col() +
facet_wrap(~ topic, scales = "free") +
coord_flip()
# tidy the document-topic combinations, with optional document names
td_gamma <- tidy(topic_model, matrix = "gamma",
document_names = rownames(austen_sparse))
td_gamma
# using stm's gardarianFit, we can tidy the result of a model
# estimated with covariates
effects <- estimateEffect(1:3 ~ treatment, gadarianFit, gadarian)
td_estimate <- tidy(effects)
td_estimate
}
## End(Not run)
``` |

tidytext documentation built on Oct. 17, 2018, 9:04 a.m.

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.