keyATM: keyATM main function
In keyATM: Keyword Assisted Topic Models

keyATM

R Documentation

keyATM main function

Description

Fit keyATM models.

Usage

keyATM(
  docs,
  model,
  no_keyword_topics,
  keywords = list(),
  model_settings = list(),
  priors = list(),
  options = list(),
  keep = c()
)

Arguments

`docs`	texts read via `keyATM_read()`.
`model`	keyATM model: `base`, `covariates`, and `dynamic`.
`no_keyword_topics`	the number of regular topics.
`keywords`	a list of keywords.
`model_settings`	a list of model specific settings (details are in the online documentation).
`priors`	a list of priors of parameters.
`options`	a list of options seed: A numeric value for random seed. If it is not provided, the package randomly selects a seed. iterations: An integer. Number of iterations. Default is `1500`. verbose: If `TRUE`, it prints loglikelihood and perplexity. Default is `FALSE`. llk_per: An integer. If the value is `j` keyATM stores loglikelihood and perplexity every `j` iteration. Default value is `10` per iterations use_weights: If `TRUE` use weight. Default is `TRUE`. weights_type: There are four types of weights. Weights based on the information theory (`information-theory`) and inverse frequency (`inv-freq`) and normalized versions of them (`information-theory-normalized` and `inv-freq-normalized`). Default is `information-theory`. prune: If `TRUE` rume keywords that do not appear in the corpus. Default is `TRUE`. store_theta: If `TRUE` or `1`, it stores `\theta` (document-topic distribution) for the iteration specified by thinning. Default is `FALSE` (same as `0`). store_pi: If `TRUE` or `1`, it stores `\pi` (the probability of using keyword topic word distribution) for the iteration specified by thinning. Default is `FALSE` (same as `0`). thinning: An integer. If the value is `j` keyATM stores following parameters every `j` iteration. The default is `5`. theta: For all models. If `store_theta` is `TRUE` document-level topic assignment is stored (sufficient statistics to calculate document-topic distributions `theta`). alpha: For the base and dynamic models. In the base model alpha is shared across all documents whereas each state has different alpha in the dynamic model. lambda: coefficients in the covariate model. R: For the dynamic model. The state each document belongs to. P: For the dynamic model. The state transition probability. parallel_init: Parallelize processes to speed up initialization. Default is `FALSE`. Please `plan()` before use this feature. resume: The resume argument is used to save and load the intermediate results of the keyATM fitting process, allowing you to resume the fitting from a previous state. The default value is `NULL` (do not resume).
`keep`	a vector of the names of elements you want to keep in output.

Value

A keyATM_output object containing:

keyword_k: number of keyword topics
no_keyword_topics: number of no-keyword topics
V: number of terms (number of unique words)
N: number of documents
model: the name of the model
theta: topic proportions for each document (document-topic distribution)
phi: topic specific word generation probabilities (topic-word distribution)
topic_counts: number of tokens assigned to each topic
word_counts: number of times each word type appears
doc_lens: length of each document in tokens
vocab: words in the vocabulary (a vector of unique words)
priors: priors
options: options
keywords_raw: specified keywords
model_fit: perplexity and log-likelihood
pi: estimated \pi (the probability of using keyword topic word distribution) for the last iteration
values_iter: values stored during iterations
kept_values: outputs you specified to store in keep option
information: information about the fitting

Examples

## Not run: 
  library(keyATM)
  library(quanteda)
  data(keyATM_data_bills)
  bills_keywords <- keyATM_data_bills$keywords
  bills_dfm <- keyATM_data_bills$doc_dfm  # quanteda dfm object
  keyATM_docs <- keyATM_read(bills_dfm)

  # keyATM Base
  out <- keyATM(docs = keyATM_docs, model = "base",
                no_keyword_topics = 5, keywords = bills_keywords)

  # Visit our website for full examples: https://keyatm.github.io/keyATM/

## End(Not run)

keyATM documentation built on April 3, 2025, 10:30 p.m.