Description Usage Arguments Details Value Examples
Obtain parameter estimates, model goodness-of-fit metrics, and posterior summaries.
For SLDA or SLDAX models, label switching is handled during estimation in the
gibbs_sldax()
function with argument correct_ls
, so it
is not addressed by this function.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 | est_beta(mcmc_fit, burn = 0, thin = 1, stat = "mean")
est_theta(mcmc_fit, burn = 0, thin = 1, stat = "mean")
get_coherence(beta_, docs, nwords = 10)
get_exclusivity(beta_, nwords = 10, weight = 0.7)
get_toptopics(theta, ntopics)
get_topwords(beta_, nwords, vocab, method = "termscore")
get_zbar(mcmc_fit, burn = 0L, thin = 1L)
post_regression(mcmc_fit)
gg_coef(mcmc_fit, burn = 0L, thin = 1L, stat = "mean", errorbw = 0.5)
## S4 method for signature 'Sldax'
gg_coef(mcmc_fit, burn = 0L, thin = 1L, stat = "mean", errorbw = 0.5)
## S4 method for signature 'Sldax'
est_beta(mcmc_fit, burn = 0, thin = 1, stat = "mean")
## S4 method for signature 'Sldax'
est_theta(mcmc_fit, burn = 0, thin = 1, stat = "mean")
## S4 method for signature 'matrix,matrix'
get_coherence(beta_, docs, nwords = 10)
## S4 method for signature 'matrix'
get_exclusivity(beta_, nwords = 10, weight = 0.7)
## S4 method for signature 'matrix'
get_toptopics(theta, ntopics)
## S4 method for signature 'matrix,numeric,character'
get_topwords(beta_, nwords, vocab, method = "termscore")
## S4 method for signature 'Sldax'
get_zbar(mcmc_fit, burn = 0L, thin = 1L)
## S4 method for signature 'Mlr'
post_regression(mcmc_fit)
## S4 method for signature 'Logistic'
post_regression(mcmc_fit)
## S4 method for signature 'Sldax'
post_regression(mcmc_fit)
|
mcmc_fit |
An object of class Sldax. |
burn |
The number of draws to discard as a burn-in
period (default: |
thin |
The number of draws to skip as a thinning
period (default: |
stat |
The summary statistic to use on the posterior
draws (default: |
beta_ |
A K x V matrix of word-topic probabilities. Each row sums to 1. |
docs |
The D x max(N_d) matrix of documents (word indices) used to fit the Sldax model. |
nwords |
The number of words to retrieve (default: all). |
weight |
The weight (between 0 and 1) to give to
exclusivity (near 1) vs. frequency (near 0). (default: |
theta |
A D x K matrix of K topic proportions for all D documents. |
ntopics |
The number of topics to retrieve (default: all topics). |
vocab |
A character vector of length V containing the vocabulary. |
method |
If |
errorbw |
Positive control parameter for the width of the +/- 2
posterior standard error bars (default: |
get_zbar()
computes empirical topic proportions from slot @topics
.
est_theta()
estimates the mean or median theta matrix.
est_beta()
estimates the mean or median beta matrix.
get_toptopics()
creates a tibble
of the topic
proportion estimates for the top ntopics
topics per document sorted by
probability.
get_topwords()
creates a tibble
of topics and the
top nwords
words per topic sorted by probability or term score.
get_coherence()
computes the coherence metric for each topic (see Mimno,
Wallach, Talley, Leenders, & McCallum, 2011).
get_exclusivity()
computes the exclusivity metric for each topic (see
Roberts, Stewart, & Airoldi, 2013).
post_regression()
creates a coda::mcmc
object
containing posterior information for the regression model parameters.
gg_coef()
plots regression coefficients
Warning: this function is deprecated.
See help("Deprecated")
.
A matrix of topic-word probability estimates.
A matrix of topic proportion estimates.
A numeric vector of coherence scores for each topic (more positive is better).
A numeric vector of exclusivity scores (more positive is better).
A data frame of the ntopics
most probable topics per document.
A K x V matrix of term-scores (comparable to tf-idf).
A matrix of empirical topic proportions per document.
An object of class coda::mcmc
summarizing the
posterior distribution of the regression coefficients and residual
variance (if applicable). Convenience functions such
as summary()
and plot()
can be used for posterior summarization.
A ggplot
object.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 | m1 <- Sldax(ndocs = 1, nvocab = 2,
topics = array(c(1, 2, 2, 1), dim = c(1, 4, 1)),
theta = array(c(0.5, 0.5), dim = c(1, 2, 1)),
beta = array(c(0.5, 0.5, 0.5, 0.5), dim = c(2, 2, 1)))
est_beta(m1, stat = "mean")
est_beta(m1, stat = "median")
m1 <- Sldax(ndocs = 2, nvocab = 2, nchain = 2,
topics = array(c(1, 2, 2, 1,
1, 2, 2, 1), dim = c(2, 2, 2)),
theta = array(c(0.5, 0.5,
0.5, 0.5,
0.5, 0.5,
0.5, 0.5), dim = c(2, 2, 2)),
loglike = rep(NaN, times = 2),
logpost = rep(NaN, times = 2),
lpd = matrix(NaN, nrow = 2, ncol = 2),
eta = matrix(0.0, nrow = 2, ncol = 2),
mu0 = c(0.0, 0.0),
sigma0 = diag(1, 2),
eta_start = c(0.0, 0.0),
beta = array(c(0.5, 0.5, 0.5, 0.5,
0.5, 0.5, 0.5, 0.5), dim = c(2, 2, 2)))
est_theta(m1, stat = "mean")
est_theta(m1, stat = "median")
mdoc <- matrix(c(1, 2, 2, 1), nrow = 1)
m1 <- Sldax(ndocs = 1, nvocab = 2,
topics = array(c(1, 2, 2, 2), dim = c(1, 4, 1)),
theta = array(c(0.5, 0.5), dim = c(1, 2, 1)),
beta = array(c(0.5, 0.4, 0.5, 0.6), dim = c(2, 2, 1)))
bhat <- est_beta(m1)
get_coherence(bhat, docs = mdoc, nwords = nvocab(m1))
m1 <- Sldax(ndocs = 1, nvocab = 2,
topics = array(c(1, 2, 2, 2), dim = c(1, 4, 1)),
theta = array(c(0.5, 0.5), dim = c(1, 2, 1)),
beta = array(c(0.5, 0.4, 0.5, 0.6), dim = c(2, 2, 1)))
bhat <- est_beta(m1)
get_exclusivity(bhat, nwords = nvocab(m1))
m1 <- Sldax(ndocs = 2, nvocab = 2, nchain = 2,
topics = array(c(1, 2, 2, 1,
1, 2, 2, 1), dim = c(2, 2, 2)),
theta = array(c(0.4, 0.3,
0.6, 0.7,
0.45, 0.5,
0.55, 0.5), dim = c(2, 2, 2)),
loglike = rep(NaN, times = 2),
logpost = rep(NaN, times = 2),
lpd = matrix(NaN, nrow = 2, ncol = 2),
eta = matrix(0.0, nrow = 2, ncol = 2),
mu0 = c(0.0, 0.0),
sigma0 = diag(1, 2),
eta_start = c(0.0, 0.0),
beta = array(c(0.5, 0.5, 0.5, 0.5,
0.5, 0.5, 0.5, 0.5), dim = c(2, 2, 2)))
t_hat <- est_theta(m1, stat = "mean")
get_toptopics(t_hat, ntopics = ntopics(m1))
m1 <- Sldax(ndocs = 1, nvocab = 2,
topics = array(c(1, 2, 2, 2), dim = c(1, 4, 1)),
theta = array(c(0.5, 0.5), dim = c(1, 2, 1)),
beta = array(c(0.5, 0.4, 0.5, 0.6), dim = c(2, 2, 1)))
bhat <- est_beta(m1)
get_topwords(bhat, nwords = nvocab(m1), method = "termscore")
get_topwords(bhat, nwords = nvocab(m1), method = "prob")
m1 <- Sldax(ndocs = 1, nvocab = 2,
topics = array(c(1, 2, 2, 2), dim = c(1, 4, 1)),
theta = array(c(0.5, 0.5), dim = c(1, 2, 1)),
beta = array(c(0.5, 0.4, 0.5, 0.6), dim = c(2, 2, 1)))
get_zbar(m1)
data(mtcars)
m1 <- gibbs_mlr(mpg ~ hp, data = mtcars, m = 2)
post_regression(m1)
## Not run:
library(lda) # Required if using `prep_docs()`
data(teacher_rate) # Synthetic student ratings of instructors
docs_vocab <- prep_docs(teacher_rate, "doc")
vocab_len <- length(docs_vocab$vocab)
m1 <- gibbs_sldax(rating ~ I(grade - 1), m = 2,
data = teacher_rate,
docs = docs_vocab$documents,
V = vocab_len,
K = 2,
model = "sldax")
gg_coef(m1)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.