model_dfr_documents: Make a topic model of DfR documents

View source: R/model.R

model_dfr_documentsR Documentation

Make a topic model of DfR documents

Description

The basic usage of this package is wrapped up in this convenience function.

Usage

model_dfr_documents(
  citations_files,
  wordcounts_dirs,
  n_topics,
  stoplist_file = file.path(path.package("dfrtopics"), "stoplist", "stoplist.txt"),
  ...
)

Arguments

citations_files

character vector with names of DfR citations.CSV or citations.tsv metadata files files

wordcounts_dirs

character vector with names of directories holding wordcounts*.CSV files

n_topics

number of topics to model

stoplist_file

name of stoplist file (containing one stopword per line)

...

passed on to train_model

Details

Given wordcount and metadata files, this function sets up MALLET inputs and then runs MALLET to produce a topic model. Normally you will want finer-grained control over the mallet inputs and modeling parameters. The steps for that process are described in the package vignette. Once the model has been trained, the results can be saved to disk with write_mallet_model

If java gives out-of-memory errors, try increasing the Java heap size to a large value, like 4GB, by setting options(java.parameters="-Xmx4g") before loading this package (or rJava).

Value

a mallet_model object holding the results

See Also

This function simply calls in sequence read_dfr_metadata, read_wordcounts, wordcounts_texts, make_instances, and train_model. To write results to disk, use write_mallet_model

Examples

# Make a 50-topic model of documents in the wordcounts folder
## Not run: model_dfr_documents("citations.CSV", "wordcounts", 50)


agoldst/dfrtopics documentation built on July 15, 2022, 4:13 p.m.