predict.BTM | R Documentation |
Classify new text alongside the biterm topic model.
To infer the topics in a document, it is assumed that the topic proportions of a document is driven by the expectation of the topic proportions of biterms generated from the document.
## S3 method for class 'BTM' predict(object, newdata, type = c("sum_b", "sub_w", "mix"), ...)
object |
an object of class BTM as returned by |
newdata |
a tokenised data frame containing one row per token with 2 columns
|
type |
character string with the type of prediction. Either one of 'sum_b', 'sub_w' or 'mix'. Default is set to 'sum_b' as indicated in the paper, indicating to sum over the the expectation of the topic proportions of biterms generated from the document. For the other approaches, please inspect the paper. |
... |
not used |
a matrix containing containing P(z|d) - the probability of the topic given the biterms.
The matrix has one row for each unique doc_id (context identifier)
which contains words part of the dictionary of the BTM model and has K columns,
one for each topic.
Xiaohui Yan, Jiafeng Guo, Yanyan Lan, Xueqi Cheng. A Biterm Topic Model For Short Text. WWW2013, https://github.com/xiaohuiyan/BTM, https://github.com/xiaohuiyan/xiaohuiyan.github.io/blob/master/paper/BTM-WWW13.pdf
BTM
, terms.BTM
, logLik.BTM
library(udpipe) data("brussels_reviews_anno", package = "udpipe") x <- subset(brussels_reviews_anno, language == "nl") x <- subset(x, xpos %in% c("NN", "NNP", "NNS")) x <- x[, c("doc_id", "lemma")] model <- BTM(x, k = 5, iter = 5, trace = TRUE) scores <- predict(model, newdata = x, type = "sum_b") scores <- predict(model, newdata = x, type = "sub_w") scores <- predict(model, newdata = x, type = "mix") head(scores)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.