sldax-summary: Summary functions for objects of class Sldax
In ktw5691/psychtm: Text Mining Methods for Psychological Research

Description Usage Arguments Details Value Examples

Obtain parameter estimates, model goodness-of-fit metrics, and posterior summaries.

For SLDA or SLDAX models, label switching is handled during estimation in the gibbs_sldax() function with argument correct_ls, so it is not addressed by this function.

est_beta(mcmc_fit, burn = 0, thin = 1, stat = "mean")

est_theta(mcmc_fit, burn = 0, thin = 1, stat = "mean")

get_coherence(beta_, docs, nwords = 10)

get_exclusivity(beta_, nwords = 10, weight = 0.7)

get_toptopics(theta, ntopics)

get_topwords(beta_, nwords, vocab, method = "termscore")

get_zbar(mcmc_fit, burn = 0L, thin = 1L)

post_regression(mcmc_fit)

gg_coef(mcmc_fit, burn = 0L, thin = 1L, stat = "mean", errorbw = 0.5)

## S4 method for signature 'Sldax'
gg_coef(mcmc_fit, burn = 0L, thin = 1L, stat = "mean", errorbw = 0.5)

## S4 method for signature 'Sldax'
est_beta(mcmc_fit, burn = 0, thin = 1, stat = "mean")

## S4 method for signature 'Sldax'
est_theta(mcmc_fit, burn = 0, thin = 1, stat = "mean")

## S4 method for signature 'matrix,matrix'
get_coherence(beta_, docs, nwords = 10)

## S4 method for signature 'matrix'
get_exclusivity(beta_, nwords = 10, weight = 0.7)

## S4 method for signature 'matrix'
get_toptopics(theta, ntopics)

## S4 method for signature 'matrix,numeric,character'
get_topwords(beta_, nwords, vocab, method = "termscore")

## S4 method for signature 'Sldax'
get_zbar(mcmc_fit, burn = 0L, thin = 1L)

## S4 method for signature 'Mlr'
post_regression(mcmc_fit)

## S4 method for signature 'Logistic'
post_regression(mcmc_fit)

## S4 method for signature 'Sldax'
post_regression(mcmc_fit)

`mcmc_fit`	An object of class Sldax.
`burn`	The number of draws to discard as a burn-in period (default: `0`).
`thin`	The number of draws to skip as a thinning period (default: `1`; i.e., no thinning).
`stat`	The summary statistic to use on the posterior draws (default: `"mean"`).
`beta_`	A K x V matrix of word-topic probabilities. Each row sums to 1.
`docs`	The D x max(N_d) matrix of documents (word indices) used to fit the Sldax model.
`nwords`	The number of words to retrieve (default: all).
`weight`	The weight (between 0 and 1) to give to exclusivity (near 1) vs. frequency (near 0). (default: `0.7`).
`theta`	A D x K matrix of K topic proportions for all D documents.
`ntopics`	The number of topics to retrieve (default: all topics).
`vocab`	A character vector of length V containing the vocabulary.
`method`	If `"termscore"`, use term scores (similar to tf-idf). If `"prob"`, use probabilities (default: `"termscore"`).
`errorbw`	Positive control parameter for the width of the +/- 2 posterior standard error bars (default: `0.5`).

get_zbar() computes empirical topic proportions from slot @topics.
est_theta() estimates the mean or median theta matrix.
est_beta() estimates the mean or median beta matrix.
get_toptopics() creates a tibble of the topic proportion estimates for the top ntopics topics per document sorted by probability.
get_topwords() creates a tibble of topics and the top nwords words per topic sorted by probability or term score.
get_coherence() computes the coherence metric for each topic (see Mimno, Wallach, Talley, Leenders, & McCallum, 2011).
get_exclusivity() computes the exclusivity metric for each topic (see Roberts, Stewart, & Airoldi, 2013).
post_regression() creates a coda::mcmc object containing posterior information for the regression model parameters.
gg_coef() plots regression coefficients
- Warning: this function is deprecated.
- See help("Deprecated").

A matrix of topic-word probability estimates.

A matrix of topic proportion estimates.

A numeric vector of coherence scores for each topic (more positive is better).

A numeric vector of exclusivity scores (more positive is better).

A data frame of the ntopics most probable topics per document.

A K x V matrix of term-scores (comparable to tf-idf).

A matrix of empirical topic proportions per document.

An object of class coda::mcmc summarizing the posterior distribution of the regression coefficients and residual variance (if applicable). Convenience functions such as summary() and plot() can be used for posterior summarization.

A ggplot object.

m1 <- Sldax(ndocs = 1, nvocab = 2,
            topics = array(c(1, 2, 2, 1), dim = c(1, 4, 1)),
            theta = array(c(0.5, 0.5), dim = c(1, 2, 1)),
            beta = array(c(0.5, 0.5, 0.5, 0.5), dim = c(2, 2, 1)))
est_beta(m1, stat = "mean")
est_beta(m1, stat = "median")
m1 <- Sldax(ndocs = 2, nvocab = 2, nchain = 2,
            topics = array(c(1, 2, 2, 1,
                             1, 2, 2, 1), dim = c(2, 2, 2)),
            theta = array(c(0.5, 0.5,
                            0.5, 0.5,
                            0.5, 0.5,
                            0.5, 0.5), dim = c(2, 2, 2)),
            loglike = rep(NaN, times = 2),
            logpost = rep(NaN, times = 2),
            lpd = matrix(NaN, nrow = 2, ncol = 2),
            eta = matrix(0.0, nrow = 2, ncol = 2),
            mu0 = c(0.0, 0.0),
            sigma0 = diag(1, 2),
            eta_start = c(0.0, 0.0),
            beta = array(c(0.5, 0.5, 0.5, 0.5,
                           0.5, 0.5, 0.5, 0.5), dim = c(2, 2, 2)))
est_theta(m1, stat = "mean")
est_theta(m1, stat = "median")
mdoc <- matrix(c(1, 2, 2, 1), nrow = 1)
m1 <- Sldax(ndocs = 1, nvocab = 2,
            topics = array(c(1, 2, 2, 2), dim = c(1, 4, 1)),
            theta = array(c(0.5, 0.5), dim = c(1, 2, 1)),
            beta = array(c(0.5, 0.4, 0.5, 0.6), dim = c(2, 2, 1)))
bhat <- est_beta(m1)
get_coherence(bhat, docs = mdoc, nwords = nvocab(m1))
m1 <- Sldax(ndocs = 1, nvocab = 2,
            topics = array(c(1, 2, 2, 2), dim = c(1, 4, 1)),
            theta = array(c(0.5, 0.5), dim = c(1, 2, 1)),
            beta = array(c(0.5, 0.4, 0.5, 0.6), dim = c(2, 2, 1)))
bhat <- est_beta(m1)
get_exclusivity(bhat, nwords = nvocab(m1))
m1 <- Sldax(ndocs = 2, nvocab = 2, nchain = 2,
            topics = array(c(1, 2, 2, 1,
                             1, 2, 2, 1), dim = c(2, 2, 2)),
            theta = array(c(0.4, 0.3,
                            0.6, 0.7,
                            0.45, 0.5,
                            0.55, 0.5), dim = c(2, 2, 2)),
            loglike = rep(NaN, times = 2),
            logpost = rep(NaN, times = 2),
            lpd = matrix(NaN, nrow = 2, ncol = 2),
            eta = matrix(0.0, nrow = 2, ncol = 2),
            mu0 = c(0.0, 0.0),
            sigma0 = diag(1, 2),
            eta_start = c(0.0, 0.0),
            beta = array(c(0.5, 0.5, 0.5, 0.5,
                           0.5, 0.5, 0.5, 0.5), dim = c(2, 2, 2)))
t_hat <- est_theta(m1, stat = "mean")
get_toptopics(t_hat, ntopics = ntopics(m1))
m1 <- Sldax(ndocs = 1, nvocab = 2,
            topics = array(c(1, 2, 2, 2), dim = c(1, 4, 1)),
            theta = array(c(0.5, 0.5), dim = c(1, 2, 1)),
            beta = array(c(0.5, 0.4, 0.5, 0.6), dim = c(2, 2, 1)))
bhat <- est_beta(m1)
get_topwords(bhat, nwords = nvocab(m1), method = "termscore")
get_topwords(bhat, nwords = nvocab(m1), method = "prob")
m1 <- Sldax(ndocs = 1, nvocab = 2,
            topics = array(c(1, 2, 2, 2), dim = c(1, 4, 1)),
            theta = array(c(0.5, 0.5), dim = c(1, 2, 1)),
            beta = array(c(0.5, 0.4, 0.5, 0.6), dim = c(2, 2, 1)))
get_zbar(m1)
data(mtcars)
m1 <- gibbs_mlr(mpg ~ hp, data = mtcars, m = 2)
post_regression(m1)
## Not run: 
library(lda) # Required if using `prep_docs()`
data(teacher_rate)  # Synthetic student ratings of instructors
docs_vocab <- prep_docs(teacher_rate, "doc")
vocab_len <- length(docs_vocab$vocab)
m1 <- gibbs_sldax(rating ~ I(grade - 1), m = 2,
                  data = teacher_rate,
                  docs = docs_vocab$documents,
                  V = vocab_len,
                  K = 2,
                  model = "sldax")
gg_coef(m1)

## End(Not run)