wordcounts_instances | R Documentation |
Given a data frame representing documents as feature counts, create a MALLET
InstanceList
object which can then be passed on to
train_model
or saved to disk for later use with
write_instances
. This function is a small convenience wrapper
for make_instances
that ensures no further stopword removal,
tokenization, or casefolding is done.
wordcounts_instances( counts, shuffle = FALSE, sep = " ", token_regex = "\\S+", preserve_case = TRUE )
counts |
data frame with |
shuffle |
randomize word order before passing on to MALLET? (See
|
sep |
separator to use between words |
token_regex |
regular expression matching a token. Ordinarily, this
should correspond to |
preserve_case |
if FALSE, all words are lowercased by MALLET |
If your tokens themselves contain whitespace, change the sep
parameter
and adjust the token_regex
accordingly.
an rJava reference to a MALLET InstanceList
make_instances
which this wraps,
train_model
, write_instances
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.