predict.LDA_VEM | R Documentation |
Gives either the predictions to which topic a document belongs or
the term posteriors by topic indicating which terms are emitted by each topic.
If you provide in newdata
a document term matrix
for which a document does not contain any text and hence does not have any terms with nonzero entries,
the prediction will give as topic prediction NA values (see the examples).
## S3 method for class 'LDA_VEM' predict( object, newdata, type = c("topics", "terms"), min_posterior = -1, min_terms = 0, labels, ... ) ## S3 method for class 'LDA_Gibbs' predict( object, newdata, type = c("topics", "terms"), min_posterior = -1, min_terms = 0, labels, ... )
object |
an object of class LDA_VEM or LDA_Gibbs as returned by |
newdata |
a document/term matrix containing data for which to make a prediction |
type |
either 'topic' or 'terms' for the topic predictions or the term posteriors |
min_posterior |
numeric in 0-1 range to output only terms emitted by each
topic which have a posterior probability equal or higher than |
min_terms |
integer indicating the minimum number of terms to keep in the output when |
labels |
a character vector of the same length as the number of topics in the topic model. Indicating how to label the topics. Only valid for type = 'topic'. Defaults to topic_prob_001 up to topic_prob_999. |
... |
further arguments passed on to topicmodels::posterior |
in case of type = 'topic': a data.table with columns doc_id, topic (the topic number to which the document is assigned to), topic_label (the topic label) topic_prob (the posterior probability score for that topic), topic_probdiff_2nd (the probability score for that topic - the probability score for the 2nd highest topic) and the probability scores for each topic as indicated by topic_labelyourownlabel
n case of type = 'terms': a list of data.frames with columns term and prob, giving the posterior probability that each term is emitted by the topic
posterior-methods
## Build document/term matrix on dutch nouns data(brussels_reviews_anno) data(brussels_reviews) x <- subset(brussels_reviews_anno, language == "nl") x <- subset(x, xpos %in% c("JJ")) x <- x[, c("doc_id", "lemma")] x <- document_term_frequencies(x) dtm <- document_term_matrix(x) dtm <- dtm_remove_lowfreq(dtm, minfreq = 10) dtm <- dtm_remove_tfidf(dtm, top = 100) ## Fit a topicmodel using VEM library(topicmodels) mymodel <- LDA(x = dtm, k = 4, method = "VEM") ## Get topic terminology terminology <- predict(mymodel, type = "terms", min_posterior = 0.05, min_terms = 3) terminology ## Get scores alongside the topic model dtm <- document_term_matrix(x, vocabulary = mymodel@terms) scores <- predict(mymodel, newdata = dtm, type = "topics") scores <- predict(mymodel, newdata = dtm, type = "topics", labels = c("mylabel1", "xyz", "app-location", "newlabel")) head(scores) table(scores$topic) table(scores$topic_label) table(scores$topic, exclude = c()) table(scores$topic_label, exclude = c()) ## Fit a topicmodel using Gibbs library(topicmodels) mymodel <- LDA(x = dtm, k = 4, method = "Gibbs") terminology <- predict(mymodel, type = "terms", min_posterior = 0.05, min_terms = 3) scores <- predict(mymodel, type = "topics", newdata = dtm)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.