sldax-summary: Summary functions for objects of class Sldax

Description Usage Arguments Details Value Examples

Description

Obtain parameter estimates, model goodness-of-fit metrics, and posterior summaries.

For SLDA or SLDAX models, label switching is handled during estimation in the gibbs_sldax() function with argument correct_ls, so it is not addressed by this function.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
est_beta(mcmc_fit, burn = 0, thin = 1, stat = "mean")

est_theta(mcmc_fit, burn = 0, thin = 1, stat = "mean")

get_coherence(beta_, docs, nwords = 10)

get_exclusivity(beta_, nwords = 10, weight = 0.7)

get_toptopics(theta, ntopics)

get_topwords(beta_, nwords, vocab, method = "termscore")

get_zbar(mcmc_fit, burn = 0L, thin = 1L)

post_regression(mcmc_fit)

gg_coef(mcmc_fit, burn = 0L, thin = 1L, stat = "mean", errorbw = 0.5)

## S4 method for signature 'Sldax'
gg_coef(mcmc_fit, burn = 0L, thin = 1L, stat = "mean", errorbw = 0.5)

## S4 method for signature 'Sldax'
est_beta(mcmc_fit, burn = 0, thin = 1, stat = "mean")

## S4 method for signature 'Sldax'
est_theta(mcmc_fit, burn = 0, thin = 1, stat = "mean")

## S4 method for signature 'matrix,matrix'
get_coherence(beta_, docs, nwords = 10)

## S4 method for signature 'matrix'
get_exclusivity(beta_, nwords = 10, weight = 0.7)

## S4 method for signature 'matrix'
get_toptopics(theta, ntopics)

## S4 method for signature 'matrix,numeric,character'
get_topwords(beta_, nwords, vocab, method = "termscore")

## S4 method for signature 'Sldax'
get_zbar(mcmc_fit, burn = 0L, thin = 1L)

## S4 method for signature 'Mlr'
post_regression(mcmc_fit)

## S4 method for signature 'Logistic'
post_regression(mcmc_fit)

## S4 method for signature 'Sldax'
post_regression(mcmc_fit)

Arguments

mcmc_fit

An object of class Sldax.

burn

The number of draws to discard as a burn-in period (default: 0).

thin

The number of draws to skip as a thinning period (default: 1; i.e., no thinning).

stat

The summary statistic to use on the posterior draws (default: "mean").

beta_

A K x V matrix of word-topic probabilities. Each row sums to 1.

docs

The D x max(N_d) matrix of documents (word indices) used to fit the Sldax model.

nwords

The number of words to retrieve (default: all).

weight

The weight (between 0 and 1) to give to exclusivity (near 1) vs. frequency (near 0). (default: 0.7).

theta

A D x K matrix of K topic proportions for all D documents.

ntopics

The number of topics to retrieve (default: all topics).

vocab

A character vector of length V containing the vocabulary.

method

If "termscore", use term scores (similar to tf-idf). If "prob", use probabilities (default: "termscore").

errorbw

Positive control parameter for the width of the +/- 2 posterior standard error bars (default: 0.5).

Details

Value

A matrix of topic-word probability estimates.

A matrix of topic proportion estimates.

A numeric vector of coherence scores for each topic (more positive is better).

A numeric vector of exclusivity scores (more positive is better).

A data frame of the ntopics most probable topics per document.

A K x V matrix of term-scores (comparable to tf-idf).

A matrix of empirical topic proportions per document.

An object of class coda::mcmc summarizing the posterior distribution of the regression coefficients and residual variance (if applicable). Convenience functions such as summary() and plot() can be used for posterior summarization.

A ggplot object.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
m1 <- Sldax(ndocs = 1, nvocab = 2,
            topics = array(c(1, 2, 2, 1), dim = c(1, 4, 1)),
            theta = array(c(0.5, 0.5), dim = c(1, 2, 1)),
            beta = array(c(0.5, 0.5, 0.5, 0.5), dim = c(2, 2, 1)))
est_beta(m1, stat = "mean")
est_beta(m1, stat = "median")
m1 <- Sldax(ndocs = 2, nvocab = 2, nchain = 2,
            topics = array(c(1, 2, 2, 1,
                             1, 2, 2, 1), dim = c(2, 2, 2)),
            theta = array(c(0.5, 0.5,
                            0.5, 0.5,
                            0.5, 0.5,
                            0.5, 0.5), dim = c(2, 2, 2)),
            loglike = rep(NaN, times = 2),
            logpost = rep(NaN, times = 2),
            lpd = matrix(NaN, nrow = 2, ncol = 2),
            eta = matrix(0.0, nrow = 2, ncol = 2),
            mu0 = c(0.0, 0.0),
            sigma0 = diag(1, 2),
            eta_start = c(0.0, 0.0),
            beta = array(c(0.5, 0.5, 0.5, 0.5,
                           0.5, 0.5, 0.5, 0.5), dim = c(2, 2, 2)))
est_theta(m1, stat = "mean")
est_theta(m1, stat = "median")
mdoc <- matrix(c(1, 2, 2, 1), nrow = 1)
m1 <- Sldax(ndocs = 1, nvocab = 2,
            topics = array(c(1, 2, 2, 2), dim = c(1, 4, 1)),
            theta = array(c(0.5, 0.5), dim = c(1, 2, 1)),
            beta = array(c(0.5, 0.4, 0.5, 0.6), dim = c(2, 2, 1)))
bhat <- est_beta(m1)
get_coherence(bhat, docs = mdoc, nwords = nvocab(m1))
m1 <- Sldax(ndocs = 1, nvocab = 2,
            topics = array(c(1, 2, 2, 2), dim = c(1, 4, 1)),
            theta = array(c(0.5, 0.5), dim = c(1, 2, 1)),
            beta = array(c(0.5, 0.4, 0.5, 0.6), dim = c(2, 2, 1)))
bhat <- est_beta(m1)
get_exclusivity(bhat, nwords = nvocab(m1))
m1 <- Sldax(ndocs = 2, nvocab = 2, nchain = 2,
            topics = array(c(1, 2, 2, 1,
                             1, 2, 2, 1), dim = c(2, 2, 2)),
            theta = array(c(0.4, 0.3,
                            0.6, 0.7,
                            0.45, 0.5,
                            0.55, 0.5), dim = c(2, 2, 2)),
            loglike = rep(NaN, times = 2),
            logpost = rep(NaN, times = 2),
            lpd = matrix(NaN, nrow = 2, ncol = 2),
            eta = matrix(0.0, nrow = 2, ncol = 2),
            mu0 = c(0.0, 0.0),
            sigma0 = diag(1, 2),
            eta_start = c(0.0, 0.0),
            beta = array(c(0.5, 0.5, 0.5, 0.5,
                           0.5, 0.5, 0.5, 0.5), dim = c(2, 2, 2)))
t_hat <- est_theta(m1, stat = "mean")
get_toptopics(t_hat, ntopics = ntopics(m1))
m1 <- Sldax(ndocs = 1, nvocab = 2,
            topics = array(c(1, 2, 2, 2), dim = c(1, 4, 1)),
            theta = array(c(0.5, 0.5), dim = c(1, 2, 1)),
            beta = array(c(0.5, 0.4, 0.5, 0.6), dim = c(2, 2, 1)))
bhat <- est_beta(m1)
get_topwords(bhat, nwords = nvocab(m1), method = "termscore")
get_topwords(bhat, nwords = nvocab(m1), method = "prob")
m1 <- Sldax(ndocs = 1, nvocab = 2,
            topics = array(c(1, 2, 2, 2), dim = c(1, 4, 1)),
            theta = array(c(0.5, 0.5), dim = c(1, 2, 1)),
            beta = array(c(0.5, 0.4, 0.5, 0.6), dim = c(2, 2, 1)))
get_zbar(m1)
data(mtcars)
m1 <- gibbs_mlr(mpg ~ hp, data = mtcars, m = 2)
post_regression(m1)
## Not run: 
library(lda) # Required if using `prep_docs()`
data(teacher_rate)  # Synthetic student ratings of instructors
docs_vocab <- prep_docs(teacher_rate, "doc")
vocab_len <- length(docs_vocab$vocab)
m1 <- gibbs_sldax(rating ~ I(grade - 1), m = 2,
                  data = teacher_rate,
                  docs = docs_vocab$documents,
                  V = vocab_len,
                  K = 2,
                  model = "sldax")
gg_coef(m1)

## End(Not run)

ktw5691/psychtm documentation built on Nov. 3, 2021, 9:10 a.m.