make_instances: Create MALLET instances from a document frame
In agoldst/dfrtopics: Tools for exploring topic models of text

make_instances

R Documentation

Create MALLET instances from a document frame

Description

Given a data frame of document IDs and texts (one per doc), such as that returned by wordcounts_texts, create a MALLET InstanceList object. This function is a simple wrapper for mallet.import. N.B. MALLET does tokenization, stopword removal, and casefolding on these texts, but if you have used wordcounts_texts, you may have already done those tasks yourself. To ensure MALLET does no further stoplisting, pass stoplist_file=NULL (the default). To ensure MALLET does no extra tokenization, pass token.regex="\S+" (whitespace tokenization—not the default). To prevent MALLET from casefolding, pass preserve.case=T. Or, equivalently, use the function wordcounts_instances instead.

Usage

make_instances(docs, stoplist_file = NULL, ...)

Arguments

`docs`	data frame with `id` and `text` columns
`stoplist_file`	name of a text file with one stopword per line, passed on to MALLET, if it exists. If it does not, or if this is `NULL` (the default), no words are removed.
`...`	passed on to `mallet.import`. A possibly important parameter to adjust is `token.regex`.

Details

The InstanceList object is the form in which MALLET understands a corpus. These are the objects passed on to the model-training routines. If saved to disk the same corpus may be used with command-line MALLET.

If java gives out-of-memory errors, try increasing the Java heap size to a large value, like 4GB, by setting options(java.parameters="-Xmx4g") before loading this package (or rJava).

Value

an rJava reference to a MALLET InstanceList

agoldst/dfrtopics
Tools for exploring topic models of text

make_instances: Create MALLET instances from a document frame
In agoldst/dfrtopics: Tools for exploring topic models of text

Create MALLET instances from a document frame

Description

Usage

Arguments

Details

Value

See Also

Related to make_instances in agoldst/dfrtopics...

R Package Documentation

Browse R Packages

We want your feedback!

agoldst/dfrtopics Tools for exploring topic models of text

make_instances: Create MALLET instances from a document frame In agoldst/dfrtopics: Tools for exploring topic models of text

Create MALLET instances from a document frame

Description

Usage

Arguments

Details

Value

See Also

Related to make_instances in agoldst/dfrtopics...

R Package Documentation

Browse R Packages

We want your feedback!

agoldst/dfrtopics
Tools for exploring topic models of text

make_instances: Create MALLET instances from a document frame
In agoldst/dfrtopics: Tools for exploring topic models of text