many_models: Performs several STM estimations for different number of...
In mikajoh/stmprinter: Print Multiple STM Models To A File For Inspection

Description Usage Arguments Value Examples

In practice, runs stm::selectModel() for each provided K topics in parallel. The function generally works the same way, except the user specifies a range of numbers of topics that they want the model fitted for and optionally how many cores to use. For example, models with 3 to 12 topics. Then, for each number of topics, stm::selectModel(), which performs several STM runs per topic, is run. It retuns a list where each item is the stm::selectModel() output for the K number of topics.

many_models(K, documents, vocab, prevalence = NULL, content = NULL,
  data = NULL, max.em.its = 100, init.type = "LDA", emtol = 1e-05,
  seed = NULL, runs = 50, frexw = 0.7, net.max.em.its = 2,
  M = 10, N = NULL, to.disk = FALSE, verbose = TRUE,
  cores = parallel::detectCores(), ...)

`K`	A vector of positive integers representing the desired number of topics for separate runs of `stm::selectModel`.
`documents`	The documents to be modeled. Object must be a list of with each element corresponding to a document. Each document is represented as an integer matrix with two rows, and columns equal to the number of unique vocabulary words in the document. The first row contains the 1-indexed vocabulary entry and the second row contains the number of times that term appears. This is similar to the format in the lda package except that (following R convention) the vocabulary is indexed from one. Corpora can be imported using the reader function and manipulated using the `prepDocuments`.
`vocab`	Character vector specifying the words in the corpus in the order of the vocab indices in documents. Each term in the vocabulary index must appear at least once in the documents. See `prepDocuments` for dropping unused items in the vocabulary.
`prevalence`	A formula object with no response variable or a matrix containing topic prevalence covariates. Use `s()`, `ns()` or `bs()` to specify smooth terms. See details for more information.
`content`	A formula containing a single variable, a factor variable or something which can be coerced to a factor indicating the category of the content variable for each document.
`data`	Dataset which contains prevalence and content covariates.
`max.em.its`	The maximum number of EM iterations. If convergence has not been met at this point, a message will be printed.
`init.type`	The method of initialization. Must be either Latent Dirichlet Allocation (LDA), Dirichlet Multinomial Regression Topic Model (DMR), a random initialization or a previous STM object.
`emtol`	Convergence tolerance. EM stops when the relative change in the approximate bound drops below this level. Defaults to .001%.
`seed`	Seed for the random number generator. `stm` saves the seed it uses on every run so that any result can be exactly reproduced. Setting the seed here simply ensures that the sequence of models will be exactly the same when respecified. Individual seeds can be retrieved from the component model objects.
`runs`	Total number of STM runs used in the cast net stage. Approximately 15 percent of these runs will be used for running a STM until convergence.
`frexw`	Weight used to calculate exclusivity
`net.max.em.its`	Maximum EM iterations used when casting the net
`M`	Number of words used to calculate semantic coherence and exclusivity. Defaults to 10.
`N`	Total number of models to retain in the end. Defaults to .2 of runs.
`to.disk`	Boolean. If TRUE, each model is saved to disk at the current directory in a separate RData file. This is most useful if one needs to run `multiSTM()` on a large number of output models.
`verbose`	A logical flag indicating whether a progress bar should be printed to screen.
`cores`	Number of CPU cores to use for parallel computation. Defaults to the number of cores available.
`...`	Additional options described in details of `stm::stm`.

It returns a list with length equal to K (i.e., one model for each provide K topics) of stm::selectModel() outputs. See ?stm::selectModel.

## Not run: 
processed <- textProcessor(
  documents = gadarian$open.ended.response,
  metadata = gadarian
)
out <- prepDocuments(
  documents = processed$documents,
  vocab = processed$vocab,
  meta = processed$meta
)

set.seed(02138)

stm_models <- many_models(
  K = 3:12,
  documents = out$documents,
  vocab= out$vocab,
  prevalence = ~ treatment + s(pid_rep), 
  data = out$meta,
  runs = 5
)

# To select a particular model to work one, just extract it by
# index. For instance, to extract the first run of the model with 3
# topics, you can run the following.

fit <- stm_models[[1]]$runout[[1]]

plot(fit)

## End(Not run)