new_tidylda: Construct a new object of class 'tidylda'

Description Usage Arguments Value Note

View source: R/utils.R

Description

Since all three of tidylda, refit.tidylda, and predict.tidylda call fit_lda_c, we need a way to format the resulting posteriors and other user-facing objects consistently. This function does that.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
new_tidylda(
  lda,
  dtm,
  burnin,
  is_prediction = FALSE,
  alpha = NULL,
  eta = NULL,
  optimize_alpha = NULL,
  calc_r2 = NULL,
  calc_likelihood = NULL,
  call = NULL,
  threads
)

Arguments

lda

list output of fit_lda_c

dtm

a document term matrix or term co-occurrence matrix of class dgCMatrix

burnin

integer number of burnin iterations.

is_prediction

is this for a prediction (as opposed to initial fitting, or update)? Defaults to FALSE

alpha

output of format_alpha

eta

output of format_eta

optimize_alpha

did you optimize alpha when making a call to fit_lda_c? If is_prediction = TRUE, this argument is ignored.

calc_r2

did the user want to calculate R-squared when calculating the the model? If is_prediction = TRUE, this argument is ignored.

calc_likelihood

did you calculate the log likelihood when making a call to fit_lda_c? If is_prediction = TRUE, this argument is ignored.

call

the result of calling match.call at the top of tidylda.

threads

number of parallel threads

Value

Returns an S3 object of class tidylda with the following slots:

beta is a numeric matrix whose rows are the posterior estimates of P(token|topic)

theta is a numeric matrix whose rows are the posterior estimates of P(topic|document)

lambda is a numeric matrix whose rows are the posterior estimates of P(topic|token), calculated using Bayes's rule. See calc_lambda.

alpha is the prior for topics over documents. If optimize_alpha is FALSE, alpha is what the user passed when calling tidylda. If optimize_alpha is TRUE, alpha is a numeric vector returned in the alpha slot from a call to fit_lda_c.

eta is the prior for tokens over topics. This is what the user passed when calling tidylda.

summary is the result of a call to summarize_topics

call is the result of match.call called at the top of tidylda

log_likelihood is a tibble whose columns are the iteration and log likelihood at that iteration. This slot is only populated if calc_likelihood = TRUE

r2 is a numeric scalar resulting from a call to calc_rsquared. This slot only populated if calc_r2 = TRUE

Note

In general, the arguments of this function should be what the user passed when calling tidylda.

burnin is used only to determine whether or not burn in iterations were used when fitting the model. If burnin > -1 then posteriors are calculated using lda$Cd_mean and lda$Cv_mean respectively. Otherwise, posteriors are calculated using lda$Cd_mean and lda$Cv_mean.

The class of call isn't checked. It's just passed through to the object returned by this function. Might be useful if you are using this function for troubleshooting or something.


tidylda documentation built on Dec. 11, 2021, 10:02 a.m.