Description Usage Arguments See Also Examples
This function takes an array of document IDs and text files (as character strings) and converts them into a Mallet instance list.
1 | mallet.import(id.array, text.array, stoplist.file, preserve.case, token.regexp)
|
id.array |
An array of document IDs. |
text.array |
An array of text strings to use as documents. The type of the array must be |
stoplist.file |
The name of a file containing stopwords (words to ignore), one per line. If the file is not in the current working directory, you may need to include a full path. |
preserve.case |
By default, the input text is converted to all lowercase. |
token.regexp |
A quoted string representing a regular expression that defines a token. The default is one or more unicode letter: "[\\p{L}]+". Note that special characters must have double backslashes. |
mallet.word.freqs
returns term and document frequencies, which may be useful in selecting stopwords.
1 2 3 4 5 | ## Not run:
mallet.instances <- mallet.import(documents$id, documents$text, "en.txt",
token.regexp = "\\p{L}[\\p{L}\\p{P}]+\\p{L}")
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.