Description Usage Arguments Details Value See Also Examples
View source: R/helper-functions.R
gibbs_sldax()
is used to fit both supervised and unsupervised topic models.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | gibbs_sldax(
formula,
data,
m = 100,
burn = 0,
thin = 1,
docs,
V,
K = 2L,
model = c("lda", "slda", "sldax", "slda_logit", "sldax_logit"),
sample_beta = TRUE,
sample_theta = TRUE,
interaction_xcol = -1L,
alpha_ = 1,
gamma_ = 1,
mu0 = NULL,
sigma0 = NULL,
a0 = NULL,
b0 = NULL,
eta_start = NULL,
constrain_eta = FALSE,
proposal_sd = NULL,
return_assignments = FALSE,
correct_ls = TRUE,
verbose = FALSE,
display_progress = FALSE
)
|
formula |
An object of class |
data |
An optional data frame containing the variables in the model. |
m |
The number of iterations to run the Gibbs sampler (default: |
burn |
The number of iterations to discard as the burn-in period
(default: |
thin |
The period of iterations to keep after the burn-in period
(default: |
docs |
A D x max(N_d) matrix of word indices for all documents. |
V |
The number of unique terms in the vocabulary. |
K |
The number of topics. |
model |
A string denoting the type of model to fit. See 'Details'.
(default: |
sample_beta |
A logical (default = |
sample_theta |
A logical (default = |
interaction_xcol |
EXPERIMENTAL: The column number of the design matrix
for the additional predictors for which an interaction with the K
topics is desired (default: |
alpha_ |
The hyper-parameter for the prior on the topic proportions
(default: |
gamma_ |
The hyper-parameter for the prior on the topic-specific
vocabulary probabilities (default: |
mu0 |
An optional q x 1 mean vector for the prior on the regression coefficients. See 'Details'. |
sigma0 |
A q x q variance-covariance matrix for the prior on the regression coefficients. See 'Details'. |
a0 |
The shape parameter for the prior on sigma2 (default: |
b0 |
The scale parameter for the prior on sigma2 (default: |
eta_start |
A q x 1 vector of starting values for the regression coefficients. |
constrain_eta |
A logical (default = |
proposal_sd |
The proposal standard deviations for drawing the
regression coefficients, N(0, proposal_sd(j)), j = 1, …, q.
Only used for |
return_assignments |
A logical (default = |
correct_ls |
Run Stephens (2000) label switching correct algorithm on
posterior? (default = |
verbose |
Should parameter draws be output during sampling? (default:
|
display_progress |
Show progress bar? (default: |
The number of regression coefficients q in supervised topic models is
determined as follows: For the SLDA model with only the K topics as
predictors, q = K; for the SLDAX model with K topics and p
additional predictors, there are two possibilities: (1) If no interaction
between an additional covariate and the K topics is desired
(default: interaction_xcol = -1L
), q = p + K; (2) if an
interaction between an additional covariate and the K topics is desired
(e.g., interaction_xcol = 1
), q = p + 2K - 1. If you supply
custom values for prior parameters mu0
or sigma0
, be sure that
the length of mu0
(q) and/or the number of rows and columns of
sigma0
(q \times q) are correct. If you supply custom starting
values for eta_start
, be sure that the length of eta_start
is
correct.
For model
, one of c("lda", "slda", "sldax", "slda_logit", "sldax_logit")
.
"lda"
: unsupervised topic model;
"slda"
: supervised topic model with a continuous outcome;
"sldax"
: supervised topic model with a continuous outcome and
additional predictors of the outcome;
"slda_logit"
: supervised topic model with a dichotomous outcome (0/1);
"sldax_logit"
: supervised topic model with a dichotomous outcome (0/1)
and additional predictors of the outcome.
For mu0
, the first p elements correspond to coefficients for the
p additional predictors (if none, p = 0), while elements
p + 1 to p + K correspond to coefficients for the K topics,
and elements p + K + 1 to p + 2K - 1 correspond to coefficients
for the interaction (if any) between one additional predictor and the K
topics. By default, we use a vector of q 0
s.
For sigma0
, the first p rows/columns correspond to coefficients
for the p additional predictors (if none, p = 0), while
rows/columns p + 1 to p + K correspond to coefficients for the
K topics, and rows/columns p + K + 1 to p + 2K - 1
correspond to coefficients for the interaction (if any) between one
additional predictor and the K topics. By default, we use an identity
matrix for model = "slda"
and model = "sldax"
and a diagonal
matrix with diagonal elements (variances) of 6.25
for
model = "slda_logit"
and model = "sldax_logit"
.
An object of class Sldax.
Other Gibbs sampler:
gibbs_logistic()
,
gibbs_mlr()
1 2 3 4 5 6 7 8 | library(lda) # Required if using `prep_docs()`
data(teacher_rate) # Synthetic student ratings of instructors
docs_vocab <- prep_docs(teacher_rate, "doc")
vocab_len <- length(docs_vocab$vocab)
m1 <- gibbs_sldax(rating ~ I(grade - 1), m = 2,
data = teacher_rate, docs = docs_vocab$documents,
V = vocab_len, K = 2, model = "sldax")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.