| make_instances | R Documentation | 
Given a data frame of document IDs and texts (one per doc), such as
that returned by wordcounts_texts, create a MALLET
InstanceList object. This function is a simple wrapper for
mallet.import. N.B. MALLET does tokenization,
stopword removal, and casefolding on these texts, but if you have
used wordcounts_texts, you may have already done
those tasks yourself. To ensure MALLET does no further stoplisting,
pass stoplist_file=NULL (the default). To ensure MALLET does
no extra tokenization, pass token.regex="\S+" (whitespace
tokenization—not the default). To prevent MALLET from
casefolding, pass preserve.case=T. Or, equivalently, use the
function wordcounts_instances instead.
make_instances(docs, stoplist_file = NULL, ...)
docs | 
 data frame with   | 
stoplist_file | 
 name of a text file with one stopword per line, passed
on to MALLET, if it exists. If it does not, or if this is   | 
... | 
 passed on to   | 
The InstanceList object is the form in which MALLET
understands a corpus. These are the objects passed on to the
model-training routines. If saved to disk the same corpus may be used
with command-line MALLET.  
If java gives out-of-memory errors, try increasing the Java heap size to a
large value, like 4GB, by setting options(java.parameters="-Xmx4g")
before loading this package (or rJava).
an rJava reference to a MALLET InstanceList
train_model, write_instances
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.