LDA_TS: Run a full set of Latent Dirichlet Allocations and Time...

View source: R/LDA_TS.R

LDA_TSR Documentation

Run a full set of Latent Dirichlet Allocations and Time Series models

Description

Conduct a complete LDATS analysis (Christensen et al. 2018), including running a suite of Latent Dirichlet Allocation (LDA) models (Blei et al. 2003, Grun and Hornik 2011) via LDA_set, selecting LDA model(s) via select_LDA, running a complete set of Bayesian Time Series (TS) models (Western and Kleykamp 2004) via TS_on_LDA on the chosen LDA model(s), and selecting the best TS model via select_TS.

conform_LDA_TS_data converts the data input to match internal and sub-function specifications.

check_LDA_TS_inputs checks that the inputs to LDA_TS are of proper classes for a full analysis.

Usage

LDA_TS(
  data,
  topics = 2,
  nseeds = 1,
  formulas = ~1,
  nchangepoints = 0,
  timename = "time",
  weights = TRUE,
  control = list()
)

conform_LDA_TS_data(data, quiet = FALSE)

check_LDA_TS_inputs(
  data = NULL,
  topics = 2,
  nseeds = 1,
  formulas = ~1,
  nchangepoints = 0,
  timename = "time",
  weights = TRUE,
  control = list()
)

Arguments

data

Either a document term table or a list including at least a document term table (with the word "term" in the name of the element) and optionally also a document covariate table (with the word "covariate" in the name of the element).

The document term table is a table of observation count data (rows: documents, columns: terms) that may be a matrix or data.frame, but must be conformable to a matrix of integers, as verified by check_document_term_table.

The document covariate table is a table of associated data (rows: documents, columns: time index and covariate options) that may be a matrix or data.frame, but must be a conformable to a data table, as verified by check_document_covariate_table. Every model needs a covariate to describe the time value for each document (in whatever units and whose name in the table is input in timename) that dictates the application of the change points. If a covariate table is not provided, the model assumes the observations were equi-spaced in time. All covariates named within specific models in formulas must be included.

topics

Vector of the number of topics to evaluate for each model. Must be conformable to integer values.

nseeds

integer number of seeds (replicate starts) to use for each value of topics in the LDAs. Must be conformable to integer value.

formulas

Vector of formula(s) for the continuous (non-change point) component of the time series models. Any predictor variable included in a formula must also be a column in the document_covariate_table. Each element (formula) in the vector is evaluated for each number of change points and each LDA model.

nchangepoints

Vector of integers corresponding to the number of change points to include in the time series models. 0 is a valid input corresponding to no change points (i.e., a singular time series model), and the current implementation can reasonably include up to 6 change points. Each element in the vector is the number of change points used to segment the data for each formula (entry in formulas) component of the TS model, for each selected LDA model.

timename

character element indicating the time variable used in the time series. Defaults to "time". The variable must be integer-conformable or a Date. If the variable named is a Date, the input is converted to an integer, resulting in the timestep being 1 day, which is often not desired behavior.

weights

Optional input for overriding standard weighting for documents in the time series. Defaults to TRUE, translating to an appropriate weighting of the documents based on the size (number of words) each document (the result of LDA is a matrix of proportions, which does not account for size differences among documents. Alternatively can be NULL for an equal weighting among documents or a numeric vector.

control

A list of parameters to control the running and selecting of LDA and TS models. Values not input assume default values set by LDA_TS_control.

quiet

logical indicator for conform_LDA_TS_data to indicate if messages should be printed.

Value

LDA_TS: a class LDA_TS list object including all fitted LDA and TS models and selected models specifically as elements "LDA models" (from LDA_set), "Selected LDA model" (from select_LDA), "TS models" (from TS_on_LDA), and "Selected TS model" (from select_TS).

conform_LDA_TS_data: a data list that is ready for analyses using the stage-specific functions.

check_LDA_TS_inputs: an error message is thrown if any input is improper, otherwise NULL.

References

Blei, D. M., A. Y. Ng, and M. I. Jordan. 2003. Latent Dirichlet Allocation. Journal of Machine Learning Research 3:993-1022. link.

Christensen, E., D. J. Harris, and S. K. M. Ernest. 2018. Long-term community change through multiple rapid transitions in a desert rodent community. Ecology 99:1523-1529. link.

Grun B. and K. Hornik. 2011. topicmodels: An R Package for Fitting Topic Models. Journal of Statistical Software 40:13. link.

Western, B. and M. Kleykamp. 2004. A Bayesian change point model for historical time series analysis. Political Analysis 12:354-374. link.

Examples

  data(rodents)

  mod <- LDA_TS(data = rodents, topics = 2, nseeds = 1, formulas = ~1,
                nchangepoints = 1, timename = "newmoon")

  conform_LDA_TS_data(rodents)
  check_LDA_TS_inputs(rodents, timename = "newmoon")


LDATS documentation built on Sept. 19, 2023, 5:08 p.m.