new_tidylda: Construct a new object of class 'tidylda'
In tidylda: Latent Dirichlet Allocation Using 'tidyverse' Conventions

new_tidylda

R Documentation

Construct a new object of class `tidylda`

Description

Since all three of tidylda, refit.tidylda, and predict.tidylda call fit_lda_c, we need a way to format the resulting posteriors and other user-facing objects consistently. This function does that.

Usage

new_tidylda(
  lda,
  dtm,
  burnin,
  is_prediction = FALSE,
  alpha = NULL,
  eta = NULL,
  optimize_alpha = NULL,
  calc_r2 = NULL,
  calc_likelihood = NULL,
  call = NULL,
  threads
)

Arguments

`lda`	list output of `fit_lda_c`
`dtm`	a document term matrix or term co-occurrence matrix of class `dgCMatrix`
`burnin`	integer number of burnin iterations.
`is_prediction`	is this for a prediction (as opposed to initial fitting, or update)? Defaults to `FALSE`
`alpha`	output of `format_alpha`
`eta`	output of `format_eta`
`optimize_alpha`	did you optimize `alpha` when making a call to `fit_lda_c`? If `is_prediction = TRUE`, this argument is ignored.
`calc_r2`	did the user want to calculate R-squared when calculating the the model? If `is_prediction = TRUE`, this argument is ignored.
`calc_likelihood`	did you calculate the log likelihood when making a call to `fit_lda_c`? If `is_prediction = TRUE`, this argument is ignored.
`call`	the result of calling `match.call` at the top of `tidylda`.
`threads`	number of parallel threads

Value

Returns an S3 object of class tidylda with the following slots:

beta is a numeric matrix whose rows are the posterior estimates of P(token|topic)

theta is a numeric matrix whose rows are the posterior estimates of P(topic|document)

lambda is a numeric matrix whose rows are the posterior estimates of P(topic|token), calculated using Bayes's rule. See calc_lambda.

alpha is the prior for topics over documents. If optimize_alpha is FALSE, alpha is what the user passed when calling tidylda. If optimize_alpha is TRUE, alpha is a numeric vector returned in the alpha slot from a call to fit_lda_c.

eta is the prior for tokens over topics. This is what the user passed when calling tidylda.

summary is the result of a call to summarize_topics

call is the result of match.call called at the top of tidylda

log_likelihood is a tibble whose columns are the iteration and log likelihood at that iteration. This slot is only populated if calc_likelihood = TRUE

r2 is a numeric scalar resulting from a call to calc_rsquared. This slot only populated if calc_r2 = TRUE

Note

In general, the arguments of this function should be what the user passed when calling tidylda.

burnin is used only to determine whether or not burn in iterations were used when fitting the model. If burnin > -1 then posteriors are calculated using lda$Cd_mean and lda$Cv_mean respectively. Otherwise, posteriors are calculated using lda$Cd_mean and lda$Cv_mean.

The class of call isn't checked. It's just passed through to the object returned by this function. Might be useful if you are using this function for troubleshooting or something.

tidylda documentation built on May 29, 2024, 11:03 a.m.