stm_tidiers: Tidiers for Structural Topic Models from the stm package
In igorscarvalho/tidytext: Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools

Description Usage Arguments Value See Also Examples

Tidy topic models fit by the stm package. The arguments and return values are similar to lda_tidiers.

## S3 method for class 'STM'
tidy(
  x,
  matrix = c("beta", "gamma", "theta"),
  log = FALSE,
  document_names = NULL,
  ...
)

## S3 method for class 'estimateEffect'
tidy(x, ...)

## S3 method for class 'STM'
augment(x, data, ...)

## S3 method for class 'STM'
glance(x, ...)

`x`	An STM fitted model object from either `stm` or `estimateEffect` from the stm package.
`matrix`	Whether to tidy the beta (per-term-per-topic, default) or gamma/theta (per-document-per-topic) matrix. The stm package calls this the theta matrix, but other topic modeling packages call this gamma.
`log`	Whether beta/gamma/theta should be on a log scale, default FALSE
`document_names`	Optional vector of document names for use with per-document-per-topic tidying
`...`	Extra arguments, not used
`data`	For `augment`, the data given to the stm function, either as a `dfm` from quanteda or as a tidied table with "document" and "term" columns

tidy returns a tidied version of either the beta or gamma matrix if called on an object from stm or a tidied version of the estimated regressions if called on an object from estimateEffect.

augment must be provided a data argument, either a dfm from quanteda or a table containing one row per original document-term pair, such as is returned by tdm_tidiers, containing columns document and term. It returns that same data as a table with an additional column .topic with the topic assignment for that document-term combination.

glance always returns a one-row table, with columns

k: Number of topics in the model
docs: Number of documents in the model
terms: Number of terms in the model
iter: Number of iterations used
alpha: If an LDA model, the parameter of the Dirichlet distribution for topics over documents

lda_tidiers

If matrix == "beta" (default), returns a table with one row per topic and term, with columns

topic: Topic, as an integer
term: Term
beta: Probability of a term generated from a topic according to the structural topic model

If matrix == "gamma", returns a table with one row per topic and document, with columns

topic: Topic, as an integer
document: Document name (if given in vector of document_names) or ID as an integer
gamma: Probability of topic given document

If called on an object from estimateEffect, returns a table with columns

topic: Topic, as an integer
term: The term in the model being estimated and tested
estimate: The estimated coefficient
std.error: The standard error from the linear model
statistic: t-statistic
p.value: two-sided p-value

## Not run: 
if (requireNamespace("stm", quietly = TRUE)) {
  library(dplyr)
  library(ggplot2)
  library(stm)
  library(janeaustenr)

  austen_sparse <- austen_books() %>%
    unnest_tokens(word, text) %>%
    anti_join(stop_words) %>%
    count(book, word) %>%
    cast_sparse(book, word, n)
  topic_model <- stm(austen_sparse, K = 12, verbose = FALSE, init.type = "Spectral")

  # tidy the word-topic combinations
  td_beta <- tidy(topic_model)
  td_beta

  # Examine the topics
  td_beta %>%
    group_by(topic) %>%
    top_n(10, beta) %>%
    ungroup() %>%
    ggplot(aes(term, beta)) +
    geom_col() +
    facet_wrap(~ topic, scales = "free") +
    coord_flip()

  # tidy the document-topic combinations, with optional document names
  td_gamma <- tidy(topic_model, matrix = "gamma",
                   document_names = rownames(austen_sparse))
  td_gamma

  # using stm's gardarianFit, we can tidy the result of a model
  # estimated with covariates
  effects <- estimateEffect(1:3 ~ treatment, gadarianFit, gadarian)
  td_estimate <- tidy(effects)
  td_estimate

}

## End(Not run)

igorscarvalho/tidytext documentation built on Aug. 23, 2020, 12:44 a.m.

igorscarvalho/tidytext index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

igorscarvalho/tidytext
Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools

stm_tidiers: Tidiers for Structural Topic Models from the stm package
In igorscarvalho/tidytext: Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools

Description

Usage

Arguments

Value

See Also

Examples

Related to stm_tidiers in igorscarvalho/tidytext...

R Package Documentation

Browse R Packages

We want your feedback!

igorscarvalho/tidytext Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools

stm_tidiers: Tidiers for Structural Topic Models from the stm package In igorscarvalho/tidytext: Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools

Description

Usage

Arguments

Value

See Also

Examples

Related to stm_tidiers in igorscarvalho/tidytext...

R Package Documentation

Browse R Packages

We want your feedback!

igorscarvalho/tidytext
Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools

stm_tidiers: Tidiers for Structural Topic Models from the stm package
In igorscarvalho/tidytext: Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools