terms.BTM: Get highest token probabilities for each topic or get biterms...
In BTM: Biterm Topic Models for Short Text

View source: R/btm.R

terms.BTM

R Documentation

Get highest token probabilities for each topic or get biterms used in the model

Description

Get highest token probabilities for each topic or get biterms used in the model

Usage

## S3 method for class 'BTM'
terms(x, type = c("tokens", "biterms"), threshold = 0, top_n = 5, ...)

Arguments

`x`	an object of class BTM as returned by `BTM`
`type`	a character string, either 'tokens' or 'biterms'. Defaults to 'tokens'.
`threshold`	threshold in 0-1 range. Only the terms which are more likely than the threshold are returned for each topic. Only used in case type = 'tokens'.
`top_n`	integer indicating to return the top n tokens for each topic only. Only used in case type = 'tokens'.
`...`	not used

Value

Depending if type is set to 'tokens' or 'biterms' the following is returned:

If type='tokens': Get the probability of the token given the topic P(w|z). It returns a list of data.frames (one for each topic) where each data.frame contains columns token and probability ordered from high to low. The list is the same length as the number of topics.
If type='biterms': a list containing 2 elements:
- n which indicates the number of biterms used to train the model
- biterms which is a data.frame with columns term1, term2 and topic, indicating for all biterms found in the data the topic to which the biterm is assigned to
Note that a biterm is unordered, in the output of type='biterms' term1 is always smaller than or equal to term2.

Examples


library(udpipe)
data("brussels_reviews_anno", package = "udpipe")
x <- subset(brussels_reviews_anno, language == "nl")
x <- subset(x, xpos %in% c("NN", "NNP", "NNS"))
x <- x[, c("doc_id", "lemma")]
model  <- BTM(x, k = 5, iter = 5, trace = TRUE)
terms(model)
terms(model, top_n = 10)
terms(model, threshold = 0.01, top_n = +Inf)
bi <- terms(model, type = "biterms")
str(bi)

BTM documentation built on Feb. 16, 2023, 10:14 p.m.