mallet.import | R Documentation |
This function takes an array of document IDs and text files (as character strings) and converts them into a Mallet instance list.
mallet.import( id.array = NULL, text.array, stoplist = "", preserve.case = FALSE, token.regexp = "[\\p{L}]+" )
id.array |
An array of document IDs. Default is |
text.array |
A character vector with each element containing a document. |
stoplist |
The name of a file containing stopwords (words to ignore), one per line, or a character vector containing stop words. If the file is not in the current working directory, you may need to include a full path. Default is no stoplist. |
preserve.case |
By default, the input text is converted to all lowercase. |
token.regexp |
A quoted string representing a regular expression that defines a token. The default is one or more unicode letter: "[\\p{L}]+". Note that special characters must have double backslashes. |
a cc/mallet/types/InstanceList
object.
mallet.word.freqs
returns term and document frequencies, which may be useful in selecting stopwords.
## Not run: # Read in sotu example data data(sotu) sotu.instances <- mallet.import(id.array = row.names(sotu), text.array = sotu[["text"]], stoplist = mallet_stoplist_file_path("en"), token.regexp = "\\p{L}[\\p{L}\\p{P}]+\\p{L}") ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.