train_model: Train a topic model

View source: R/model.R

train_modelR Documentation

Train a topic model

Description

Invokes MALLET's parallel topic modeling algorithm on a set of documents represented as an InstanceList.

Usage

train_model(
  instances,
  n_topics,
  alpha_sum = 5,
  beta = 0.01,
  n_iters = 200,
  n_max_iters = 10,
  optimize_hyperparameters = TRUE,
  n_hyper_iters = 20,
  n_burn_in = 50,
  symmetric_alpha = FALSE,
  threads = 4L,
  seed = NULL,
  metadata = NULL
)

Arguments

instances

either an rJava reference to an InstanceList object or the name of a file into which such an object has been serialized

n_topics

how many topics to train?

alpha_sum

initial sum of hyperparameters alpha_k: priors of topics over document

beta

initial value of hyperparameter β: prior of topics over words

n_iters

number of Gibbs sampling iterations to run

n_max_iters

number of "iterated conditional modes"

optimize_hyperparameters

if TRUE (the default), optimize α_k and β. If FALSE, the value of symmetric_alpha is ignored.

n_hyper_iters

how often to do hyperparameter optimization

n_burn_in

number of initial "burn-in" iterations before hyperparameter optimization

symmetric_alpha

if FALSE (the default), allow the α_k to be different from one another. If TRUE when optimize_hyperparameters is TRUE, then the sum of the alphas will still be varied by the algorithm, but all the α_k will be the same.

threads

number of threads to run in parallel.

seed

MALLET's random number seed: set this to ensure a reproducible run of the Gibbs sampling algorithm.

metadata

not used in the modeling process, but the model object returned by the function will store a reference to it if supplied

Details

Create the instance list object with make_instances. MALLET's progress reporting appears on the console by default; to change this, set the package option dfrtopics.mallet_logging (see help("mallet-logging")).

If Java gives out-of-memory errors, try increasing the Java heap size to a large value, like 4GB, by setting options(java.parameters="-Xmx4g") before loading this package (or rJava).

Value

a mallet_model object

See Also

make_instances, make_instances, model_dfr_documents, write_mallet_model


agoldst/dfrtopics documentation built on July 15, 2022, 4:13 p.m.