In Shusei-E/keyATM: Keyword Assisted Topic Models

knitr::opts_chunk$set(eval = T, echo = TRUE)

Preparing documents and time index

Please read Preparation for the reading of documents and creating a list of keywords. We use bills data we prapared (documents and keywords).

HMM model needs a vector of time indexes. This is an "index" and not the actual time stamps (year and date). For example, bills data contain session information, which ranges from 101 to 114.

library(keyATM)
data(keyATM_data_bills)
bills_time_index <- keyATM_data_bills$time_index
table(bills_time_index)

Time index should start from 1, increment by $1$, and have the same value if they are in the same time (for example the session of congress).

bills_time_index <- as.integer(bills_time_index - 100)
table(bills_time_index)

Please make sure that the order of time index is the same as the order of documents.

library(quanteda)
bills_dfm <- keyATM_data_bills$doc_dfm  # quanteda object
keyATM_docs <- keyATM_read(bills_dfm)

bills_keywords <- list(
                       Education = c("education", "child", "student"),
                       Law       = c("court", "law", "attorney"),
                       Health    = c("public", "health", "program"),
                       Drug      = c("drug", "treatment")
                      )

Fitting the model

Researchers need to specify the number of states in addition to keywords.

out <- keyATM(    
              docs              = keyATM_docs,    # text input
              no_keyword_topics = 3,              # number of topics without keywords
              keywords          = bills_keywords, # keywords
              model             = "dynamic",          # select the model
              model_settings    = list(time_index = bills_time_index,
                                       num_states = 5), # number of (latent) states in the model 
              options           = list(seed = 50)
             )